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PARTIAL REINFORCEMENT: 


A SELECTIVE REVIEW OF THE LITERATURE SINCE 1950! 


DONALD J. LEWIS? 
Louisiana State University 


In May, 1950, Jenkins and Stanley 
published Partial Reinforcement: A 
Review and Critique, which is now a 
standard reference in the area. Its 
very excellence has probably in- 
hibited other workers in the area 
from attempting another review of 
the literature. The partial reinforce- 
ment literature, however, is so large 
and has increased at such a rapid 
rate, that it is difficult now to main- 
tain a perspective on the whole area. 
The present review is offered, with 
considerable trepidation, as an at- 
tempt to point out at least the major 
results and trends of the post-1950 
research. 

Even though the list of references 
is long, this review unfortunately is 
not exhaustive. Some studies are not 
included because they were believed 
by the writer to be insignificant, in- 
conclusive, or badly conceived. Prob- 
ably other studies, perhaps excellent 
ones, simply have been overlooked, 
although the writer has made a con- 
scientious effort to read every study 
on partial reinforcement that has 
been published in major psycholog- 
ical and allied journals. Another 


1 This report was supported by a grant from 
the National Science Foundation. , , 

2 The author wishes to express his appreci- 
ation to John W. Cotton of Northwestern 
University and D. W. Tyler of Louisiana 
State University who read the entire manu- 
script. Any inadequacies remaining are not 


to be attributed to them. 


large class of excluded studies are 
those which involve a free respond- 
ing situation. This exclusion is purely 
arbitrary and does not mean that 
such studies are considered to be of 
less significance, nor that different 
principles necessarily will be re- 
quired for their explanation. Only 
discrete trial studies are considered, 
primarily because these happen to be 
of major interest to the writer. A re- 
view of free responding studies is 
badly needed. A very few studies be- 
fore 1950 are considered either be- 
cause of their parametric design or 
their importance to theory. 

This paper is organized around the 
major empirical variables that have 
been investigated in attempts to de- 
termine the effects of partial rein- 
forcement on extinction. The aim in 
the data section has been to deter- 
mine parametric relations between 
stimulus and response variables. 
Some studies whose primary purpose 
was to test a theoretical orientation 
do not allow such a determination, 
but they are considered to be val- 
uable apart from their theory. There- 
fore, they, too, are cited in the data 
section. Without their theoretical 
context these studies may appear 
somewhat disjointed, but it is some- 
times salutary to look at data only in 
relation to empirical variables. Other 
studies seem most important pri- 
marily for their contribution to 
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theory and they are considered in the 
theory section. Some, of course, must 
be considered in both sections. Be- 
cause this paper is concerned with ex- 
tinction, acquisition phenomena are 
treated only incidentally. Perhaps a 
necessary introduction to this paper 
is a thorough reading of the Jenkins 
and Stanley (1950) paper. The issues 
and problems they discuss are not 
taken up again unless there is a clear 
need. 


DATA 


Percentage of reward. Following 
their review of the pertinent litera- 
ture, Jenkins and Stanley (1950) ar- 
rived at an empirical generalization 
which stated: “AIL other things 
equal, resistance to extinction after 
partial reinforcement is greater than 
that after continuous reinforcement 
when behavior strength is measured 
in terms of single responses” (p. 222). 
Nine years and a great deal of re- 
search later this generalization still 
stands, perhaps more firmly than 
ever. Because this law of partial re- 
inforcement seems so well estab- 
lished, there seems little point here 
in noting those studies involving only 
two percentages of reinforcement 
whose main point has been to demon- 
strate again the PRE (partial rein- 
forcement effect). We will be pri- 
marily concerned in this section with 
studies that have attempted to de- 
termine a parametric law. 

Grant and Schipper (1952) used an 
eyelid conditioning situation with a 
light CS and an airpuff UCS and 
counted the percentage of CRs in ac- 
quisition and extinction. The per- 
centages of reinforcement used were 
0%, 25%, 50%, 75%, and 100%. 
The results indicated that the per- 
centage of CRs during acquisition 
was an increasing function of per- 
centage of reinforcement with the 
greatest response strength for the 


highest percentage of reward. Dur- 
ing extinction there was a rapid de- 
crease in response strength for the 
100% group, and less rapid for the 
others. The greatest resistance to ex- 
tinction was for the 50% and 75% 
groups, falling off for both the 100% 
and 25% groups. The 0% group 
showed practically no conditioning 
and therefore no resistance to extinc- 
tion. There was, then, a “hard core 
of resistance” to extinction in the re- 
gion between, say, 40% and 80% re- 
inforcement. 

Duplicating in design the eyelid 
study discussed above, Grant, Hake, 
and Hornseth (1951) used a verbal 
conditioning situation. During ac- 
quisition, percentage of positive re- 
sponses were again an increasing 
function of percentage of reinforce- 
ment, with each group emitting 
positive responses at about the same 
rate as it received reinforcements. 
In extinction, however, the 25% 
group gave the greatest resistance to 
extinction, yielding the hump of the 
N-shaped function toward the lower 
percentages. 

Lewis and Duncan (1956b) used a 
“one-arm bandit” slot machine, mod- 
ified so that payoffs could be con- 
trolled. Each payoff was worth 5¢ 
to the Ss, and the percentages used 
were 100%, 75%, 50%, 37.5%, 25%, 
12.5%, and 0%. The total number of 
plays to quitting was found to be an 
inverse function of the percentage of 
reward with the 100% Ss quitting 
first and the 0% Ss quitting last. 
There was no evidence for a f- 
shaped function. 

Tn another experiment, with 0%, 
11%, 33%, 67%, and 100% reward, 
Lewis and Duncan (1957) asked their 
Ss to state for each trial of the 9-trial 
acquisition series their “expectation” 
of winning or not winning on the next 
trial. These expectancies were quan- 
tified on a scale from 1 to 6, with 1 
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representing a firm expectancy of not 
winning and 6 a firm expectancy of 
winning. The results showed that ex- 
pectancies were a regular function of 
percentage of reinforcement both 
during acquisition and extinction, 
and that the expectancy of winning 
dropped off very rapidly during ex- 
tinction for the 100% group. This 
was also the group that quit first. In 
this case there was a slight drop for 
the 0% group, suggesting a M-shaped 
function. 

Using children from approximately 
five and one-half to six and one-half 
years of age in a partial reinforcement 
situation with plastic toys as reward, 
Lewis (1952) varied four percentages 
of reward—100%, 50%, 60%, and 
0%—in a 10-trial acquisition series. 
He found no difference in resistance 
to extinction between the 50% and 
60% groups and between the 100% 
and 0% groups, although the latter 
two groups quit significantly sogner 
than did the former, again a A- 
shaped function. 

Five studies have been considered, 
each of which plotted at least four 
points along a percentage of rein- 
forcement dimension. The wide 
variety of experimental situations 
used has undoubtedly helped to ob- 
scure a parametric function. Even 
so, a M-shaped function is found in 
four of the five studies. Grant and 
Schipper (1952), Grant, Hake, and 
Hornseth (1951), Lewis (1952), and 
Lewis and Duncan (1957) present 
evidence for such a fM-shaped func- 
tion with depressions at both the high 
and low percentages. Because a non- 
monotonic function usually means 
that at least two processes are operat- 
ing, Grant and Schipper (1952) 
guessed as to what these two proc- 
esses might be. The first process, 
they hypothesized, is a discrimina- 
tive one. The higher the percentage 
of reinforcement, the more the ac- 


quisition series should ‘‘stand out” 
from the extinction series, and the 
less PRE should result. A discrim- 
ination process thus results in a de- 
creasing function as a result of per- 
centage of reinforcement. The sec- 
ond process is a learning one. With a 
response starting close to zero re- 
sponse strength, the greater the per- 
centage of reward, for equal numbers 
of trials below some limit, the greater 
the response strength. Thus the 
learning process produces an increas- 
ing function, and the discrimination 
process should produce a trend in the 
opposite direction. The combination 
of these two results in a N-shaped 
function. 

If Grant and Schipper are correct, 
the point of inflection of the N would 
need to vary with the degree of 
learning. The greater the degree of 
learning, the more the point of inflec- 
tion should move toward the low end 
of the percentage scale. This is be- 
cause with a greater degree of learn- 
ing, the learning process should tend 
to drop out, leaving only the dis- 
crimination process in operation. 
Several percentages of reinforcement 
and numbers of acquisition trials 
need to be combined in the same ex- 
periment to verify this conjecture. 

Pattern. Not only may reinforce- 
ment be given in different percen- 
tages, but within any percentage less 
than 100, the pattern may vary. 
Within 50% reinforcement, for ex- 
ample, the rewards may be given 
randomly, irregularly but not ran- 
domly, or regularly. And a large 
number of regular patterns are possi- 
ble, depending on the length of the 
acquisition series. A systematic ex- 
ploration of these variables is needed. 

Grant, Riopelle, and Hake (1950) 
set up random (R), single alternation 
(SA), double alternation (DA), and 
100% groups in an eyelid condition- 
ing situation. At the end of acquisi- 
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tion the 100% and the R groups were 
about equal and superior to the DA 
and SA groups. During extinction 
the R group showed a very rapid de- 
crement in response magnitude and 
ended below the other groups. This 
is one of the few studies reporting 
faster extinction following partial re- 
inforcement. There is no ready ex- 
planation for this paradox, 

Longnecker, Krauskopf, and Bit- 

terman (1952) performed an experi- 
ment comparing SA and R groups, 
measuring the galvanic skin response 
(GSR). They found no statistically 
significant mean difference between 
R and SA at the end of acquisition, 
although the R group appears to have 
had a stronger response. Very in- 
teresting was the saw-toothed ac- 
quisition curve for SA. Responses 
were considerably stronger on rein- 
forced trials than nonreinforced trials. 
During extinction the R group was 
clearly and significantly more resist- 
ant to extinction than the SA group. 

Using a combination elevated run- 
way and a jumping platform with 
rats as Ss, Tyler, Wortz, and Bitter- 
man (1953) explored further the dif- 
ferences between SA and R. Again, 
there were no differences at the end 
of acquisition, and again the SA 
group was significantly less resistant 
to extinction than the R. Also, the 
saw-toothed acquisition function ap- 
peared for SA, developing about the 
60th trial. Faster runs were recorded 
after a nonreinforced trial, Earlier 
in acquisition just the opposite was 
true; faster runs occurred after re- 
warded trials. A definite patterning 
effect is thus discernable in these two 
studies, not only in terms of resis- 
tance to extinction, but also in terms 
of the acquisition function. 

Hake and Grant (1951) and Hake, 
Grant, and Hornseth (1951) were 
concerned with the patterning effects 
of blocks of reinforced and unrein- 


forced trials. They varied factorially 
both the number of blocks (and 
therefore the number of transitions 
from nonreinforced to reinforced 
trials) and the number of trials in 
blocks. This procedure resulted in 
combining variables, since as the 
number of transitions increased and 
as the number of nonreinforced re- 
sponses per block increased, the total 
number of nonreinforced trials also 
increased. In one of the studies 
(Hake & Grant, 1951) an eyelid con- 
ditioning situation was used. No 
significant results were found. This 
may be because of the combination 
of variables mentioned above, or it 
may be that when working with com- 
plete blocks of reinforced and non- 
reinforced trials, a longer acquisition 
series is necessary. The acquisition 
series used here was only 1, 3, and 5 
transitions. In a second experiment 
of identical design (Hake, Grant, & 
Hornseth, 1951) but with a verbal 
conditioning situation, both the num- 
ber of transitions and the number of 
reinforced trials were significant vari- 
ables. Three transitions resulted in 
greater resistance to extinction than 
either 1 or 5, and the fewer the num- 
ber of nonreinforced trials the greater 
the resistance to extinction. 
Working with a modification of the 
Humphreys’ board, Grosslight, Hall, 
and Murnin (1953) gave reinforce- 
ments in three different patterns. 
One group (RR) was reinforced on 
every trial, a second (RU) received 
reinforcements in a an i fol- 
lowed by a single block of nonrein- 
forced ARA ane the third (UR) had 
all the nonreinforced trials occurring 
as a block in the middle of a sequence 
of reinforced trials. After five ac- 
quisition sessions of this kind, an ex- 
tinction session was given, and the 
number of “plus’’ responses deter- 
mined for the three groups. The UR 
group was more resistant to extinc- 
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tion than the other two, and the RU 
group was more resistant to extinc- 
tion than RR. A group, it seems, 
whose acquisition series ends with 
reinforcement is more resistant to ex- 
tinction than a group whose acquisi- 
tion series does not end with rein- 
forcement. A similar finding was re- 
ported by Ishihara (1954). 

Grosslight and Radlow (1955) ob- 
tained similar results with a response 
reversal situation and albino rats. 
Response reversal was significantly 
slower in the UR condition than in 
the RR and RU conditions. For this 
study, however, there was no dif- 
ference between RR and RU. A 
likely explanation of this latter find- 
ing lies in the 24-hour interval sep- 
arating the sessions in the animal 
study, whereas the interval was of the 
order of seconds or minutes for the 
human study. With a short interval 
between sessions, the RU group also 
has a reward closely following non- 
reinforcement, although not as 
closely as for the UR condition. For 
a second study (Grosslight & Rad- 
low, 1957), with rats and a habit re- 
versal problem, only one nonrein- 
forced trial a session for three sessions 
was used in the RU and UR condi- 
tions. Again the UR was signif- 
icantly slower to reverse, and there 
was no difference between RU and 
RR. Again there was a 24-hr, in- 
terval between sessions. Finally 
(Lewis & Duncan, 1956b), one study 
shows that irregular variations of 
patterns within a single percentage of 
reinforcement (25%) results in no 
difference in PRE. 

It seems clear, reviewing the 
studies on patterning, that SA results 
in less resistance to extinction than 
does R (Longnecker, Krauskopf, & 
Bitterman, 1952; Tyler, Wortz, & 
Bitterman, 1953). It also seems 
clear that blocks of nonreinforced 
trials which end a number of times 


with at least one reinforced trial are 
superior to continuous reinforcement 
and to those blocks which end in a 
nonreinforced trial (Grosslight & 
Radlow, 1957; Hake, Grant, & Horn- 
seth, 1951). Little more is known 
about patterning because even the 
more obvious variables and their 
combinations have not been studied. 

Secondary reinforcement. In the 
following discussion, S® will repre- 
sent a neutral stimulus serving as a 
discriminative stimulus which is pre- 
sented before the response. And St 
will represent the stimulus used as a 
secondary reinforcer.’ St is presented 
after the response whose strength it 
is hoped to strengthen or maintain. 
Primary reinforcement (S®) is or- 
dinarily necessary to make a stimulus 
serve as S”? or S", and SÈ may be pre- 
sented according to many different 
schedules. There are thus three 
variables, the S”, the S', and the S®, 
which may be manipulated somewhat 
independently, and as a result there 
are a number of different ways in 
which secondary reinforcement may 
be combined with partial reinforce- 
ment. Because each of these ways 
may bring about different results, the 
problem of the relationship of sec- 
ondary reinforcement to partial rein- 
forcement fractionates into the num- 
ber of experimental designs by which 
the three may be combined. 

Three principal experimental de- 
signs may be used during acquisition 
to vary the percentage of presenta- 
tion of S? and S”. Design A-1: Pri- 
mary reinforcement is given only 
part of the time and S® is presented 
on every occasion. Design A-2: 
Whenever SÈ is given, S” is also pre- 
sented, but when SÈ is not given, 
neither is S". Design A-3: Primary re- 
inforcement is presented on a certain 

3 No position is implied on the issue of 


¢. D 
whether a stimulus must first serve as an S 
in order to serve as an S". 
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TABLE 1 
Experimental Reinforcer 
es 
Design SR S» 
A-1 partial continuous 
A-2 partial partial, contingent 
on SR 
A-3 Partial partial, not con- 


tingent on SR 


percentage of trials and so is Sa, but 
the two percentages are varied in- 
dependently so that SÈ may coincide 
with S” and it may not. These de- 
signs are represented in Table 1. 
Because the effects of “partial sec- 
ondary reinforcement” can only be 
determined when compared to ‘‘con- 
tinuous secondary reinforcement,” 
only those studies in which such a 
comparison is made will be cited in 
this section. 

In addition to these general meth- 
ods of combining primary and secon- 
dary reinforcement are two general 
methods of presenting S during ex- 
tinction, or during the test trials. 
The St may be presented on every 
trial—Design E-1—or only on part 
of the trialsk—Design E-2. The 
studies available in this area will be 
classified according to the design 
used, 

There seems to be only one study 
in which Design A-1 has been used. 
Peterson (1956) used a runway for 
acquisition, and a buzzer was pre- 
sented on every trial just as the S en- 
tered a delay chamber. (Peterson 
was also interested in delay of rein- 
forcement.) Tests for the effective- 
ness of & were given in a bar pressing 
device in which S$ had had no pre- 
vious experience. Perhaps this proce- 
dure does not fit the usual paradigm 
for extinction, but as a test for the 
effectiveness of a secondary rein- 

forcer it is certainly superior to or- 
dinary extinction. Peterson found no 


difference between an S associated 
with 50% reinforcement and one 
associated with 100% reinforcement. 
There is also no evidence that any 
secondary reinforcement effects oc- 
curred with either percentage. 

Notterman (1951) varied the num- 
ber of nonreinforced trials in a run- 
way with the number of reinforced 
trials held constant. He presented a 
light as S” on each reinforced trial 
just as S entered the goal box. On 
nonreinforced trials Se” was not pre- 
sented. Thus Notterman used De- 
sign A-2. Certain of his groups were 
then extinguished to the same level 
(although Notterman’s argument for 
this is not as firm as would be desir- 
able) in the absence of St- Then St 
was reintroduced for further trials 
but no primary reward was given. 
Response strength increased as a di- 
rect function of the number of orig- 
inally interspersed nonreinforced 
trials. The smaller the percentage of 
reinforcement, the greater the effec- 
tiveness of the S". The smallest per- 
centage of reinforcement used was 
33%. 

No study has been reported using 
Design A-3. It is, however, reason- 
able to guess that if Design A-3 is 
combined with Design E-2, St will be 
most efficacious in maintaining a re- 
sponse or bringing one about anew. 

Three designs for combining par- 
tial reinforcement with secondary re- 
inforcement have been considered. 


‘For Design A-1, little can be said. 


Peterson (1956) found no clear evi- 
dence for secondary reinforcement 
even in a standard control group. 
Notterman (1951), with an A-2 de- 
sign and a nice parametric study, 
found greater effectiveness for Sf with 
smaller percentages of reinforcement. 
Design A-3 is untried. ; 
Successive acquisitions and extinc- 
tions. Perkins and Cacioppo (1950) 
gave one group of animals 16 ac- 
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quisition trials, only 8 of which were 
reinforced, followed by 30 extinction 
trials. Seven days of this kind of 
training were given. A second group 
received the same succession of 
events except that all acquisition 
trials were reinforced. The results 
showed that the 50% Ss were more 
resistant to extinction throughout the 
extinction sequences, and the rate of 
extinction increased with successive 
extinctions. 

Lauer and Estes (1955) used a 
jumping stand and gave three series, 
each of 14 acquisition and 8 extinc- 
tion trials. For one group the first 
acquisition series was with 100% re- 
inforcement, the second with 50%, 
and the third with 100% again. The 
other group started with 50% and al- 
ternated on successive acquisitions. 
They found that successive extinc- 
tions resulted in Jess extinction, both 
in rate and terminal level. Extinc- 
tion rates were similar after 50% and 
100%, except that there was a 
greater decrement on the first ex- 
tinction day following 100% rein- 
forcement. 

Lauer and Estes (1955) pointed 
out that successive acquisitions and 
extinctions had a great deal in com- 
mon with typical partial reinforce- 
ment in that nonreinforced trials (ex- 
tinctions) follow reinforced trials 
(acquisitions). Razran (1955) has 
also made this point. The main dif- 
ference between the two is in the 
length of the sequence of reinforced 
and nonreinforced trials, which ac- 
tually makes successive acquisitions 
and extinctions identical to the pat- 
terning of partial reinforcement. 
Pursuing this comparison of succes- 
sive acquisitions and extinctions to 
pattern of reinforcement, Lauer and 
Carterette (1957) gave one group suc- 
cessive acquisitions only (A-A) and 
another group received successive ac- 
quisitions and extinctions (A-E). 


Both groups received spaced trials. 
The A-A group was presumed to be 
analogous to 100% reinforcement and 
the A-E group was presumed to be 
analogous to 50% reinforcement. 
They found that the mean starting 
speed of the A-A group on all re- 
acquisition series was below that of 
the A-E group, even though the for- 
mer had had more reinforcements. 
This is a very interesting finding 
Its typical partial reinforcement 
counterpart should show that the 
reinforced trial following a nonrein- 
forced trial would be superior to a 
reinforced trial following another re- 
inforced trial. Amsel (1958) pre- 
sented data to indicate that this is 
what happens. Amsel interpreted his 
findings to indicate that nonrein- 
forcement has a frustration-drive ef- 
fect which serves to energize the be- 
havior on the next trial. However, it 
is hard to see how the frustration- 
drive could persist over the 30-min. 
intertrial interval used by Lauer and 
Carterette (1957). It is just this ef- 
fectiveness of partial reinforcement 
procedures on widely spaced trials 
that is most difficult to interpret. 
The Perkins and Cacioppo study 
(1950) showed decreased resistance 
to extinction with successive extinc- 
tions, while both Lauer studies 
(Lauer & Estes, 1955; Lauer & Car- 
terette, 1957) showed just the re- 
verse. It is interesting to note that 
other successive acquisition and ex- 
tinction studies (e.g., Bullock & 
Smith, 1953) report findings similar 
to Perkins and Cacioppo. The reason 
for the difference in results lies per- 
haps in the shortness of the extinc- 
tion series in the two Lauer studies, 
8 trials in one case and 12 in the. 
other. The other studies in this area 
used a much longer extinction period. 
In any case, the Lauer studies 
point the way toa considerable con- 
solidation of partial reinforcement 
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variables, in that successive acquisi- 
tions and extinctions can be con- 
sidered as an aspect of the pattern 
and percentage of reinforcement. If 
any intertrial interval phenomena 
should turn out to be reliable, they 
should also show up as a function of 
the interval between the successive 
acquisition and extinction series. 

Number of trials. Only a few studies 
have been concerned specifically with 
the relationship between partial re- 
inforcement and the number of ac- 
quisition trials. A large number of 
studies (e.g., Jenkins & Stanley, 
1950) however, have shown that ac- 
quisition responding is greater with 
100% than with partial reinforce- 
ment. A study by Weinstock (1958) 
amplifies this statement so that it 
holds only for the early acquisition 
trials. Weinstock gave 109 acquisi- 
tion trials and, for the later trials, the 
smaller percentage groups were per- 
forming better than the larger per- 
centage groups, although there was 
no regular order. During acquisition, 
then, there is an interaction between 
percentage of reinforcement and 
number of trials. 

Weinstock reports the usual PRE 
with relatively clear evidence that 
rate of decrement for the low per- 
centage groups is less than that for 
the high percentage groups. In addi- 
tion, the higher percentage groups 
achieve a different and lower 
asymptote than the lower percentage 
groups. Finally, Weinstock notes a 
significant increase in performance 
late in extinction for the 100% 
group. We are faced here with the 
anomalous situation that, at least 
for continuously rewarded groups. 
rewards depress response strength 
late in acquisition and nonrewards 
increase response strength late in ex- 
tinction. 

Lewis and Duncan (1956a, 1958a) 
in two studies combined different 


numbers of acquisition trials with dif- 
ferent percentages of reinforcement. 
They found no interaction during ex- 
tinction for the two variables, but the 
larger number of acquisition trials, 
in both cases, resulted in quicker ex- 
tinction. Capaldi (1957, 1958) in two 
studies reports a very similar finding; 
the more acquisition trials, the faster 
the extinction. In his second study, 
Capaldi combined length of acquisi- 
tion with pattern of reinforcement. 
Some Ss received rewards on every 
other trial, and some received re- 
wards randomly. The alternately re- 
warded, long acquisition group ex- 
tinguished faster than the alternately 
rewarded, short acquisition group, 
and faster than either the short or 
long randomly rewarded groups. 
There was no difference between the 
irregularly rewarded groups asa func- 
tion of the length of acquisition. 
The Lewis and Duncan (1956a, 
1958a) and Capaldi (1958) studies 
differ on one point. Capaldi found 
decreased resistance to extinction 
after a long acquisition series only for 
regular reinforcement. Lewis and 
Duncan, however, found no excep- 
tion for irregular reinforcement. 
They stated that for a well-learned 
response the function of acquisition 
trials, as far as resistance to extinc- 
tion is concerned, is to establish a 
stable stimulus pattern; the more ac- 
quisition trials the stabler the pat- 
tern, and when extinction begins, the 
stimulus change will be greater. This 
should be true with regular and ir- 
regular reinforcement, although more 
acquisition trials would probably be 
necessary with the latter. Capaldi 
may not have given a sufficient num- 
ber of acquisition trials to decrease 
resistance for his irregularly re- 
warded group. Also, he presents no 
evidence that his groups were equal 
at the end of acquisition. : 
Spontoneous recovery. There is very 
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little information about spontaneous 
recovery as affected by any variable, 
and certainly there is little known 
about the effect of partial reinforce- 
ment on spontaneous recovery. Per- 
kins and Cacioppo (1950), whose 
study has been briefly described 
above, comment simply, “‘spontane- 
ous recovery was complete before 
each reconditioning trial.’ Notter- 
man, Schoenfeld, and Bersh (1952) 
conditioned heart rate by means of 
electric shock. The partial group 
showed no evidence of extinction al- 
though the continuous group did. 
No spontaneous recovery was possi- 
ble for the partial group, nor was any 
found for the continuous group. 

Lewis (1956) with a straight alley 
found no differential effect of partial 
reinforcement on spontaneous re- 
covery; spontaneous recovery OC- 
curred for both partial and continu- 
ous groups. Lauer and Estes (1955) 
found spontaneous recovery Occur- 
ring from day to day following 100% 
reinforcement, but not following 50% 
In fact, there was a decrement in re- 
sponse between daily blocks for the 
50% group, but recovery seemed to 
occur within the daily blocks. 

The empirical evidence is too 
meager and conflicting to warrant 
any firm conclusion about the effects 
of partial reinforcement on spon- 
taneous recovery. 

Punishment. In the avoidance con- 
ditioning situation, S can avoid 
getting punishment by an appro- 
priate response. Thus, early in ac- 
quisition, S gets punished on some of 
the trials and not on others, a partial 
procedure. In the escape learning 
situation, S gets the punishment no 
matter what he does, and this is a 
continuous procedure. If, as 1s com- 
monly conceived, shock offset is re- 
inforcing, one would expect, from the 
principle of partial reinforcement, 
that extinction would be more pro- 


longed after avoidance conditioning 
than after escape conditioning. Jones 
(1953), Logan (1951), and Sheffield 
and Temmer (1950) have shown this 
to be the case. Jones (1953) has also 
shown that an “intermittent escape” 
schedule, one that involves with- 
holding punishment on some trials, 
results in greater resistance to extinc- 
tion than orthodox escape. Wynne 
and Solomon (1955) have pointed 
out, in addition, that the avoidance 
extinction situation actually involves 
further learning of the instrumental 
response. They argue that two re- 
sponses are learned during the ac- 
quisition series, one is the instru- 
mental response and the other is a 
conditioned emotional response 
(CER). The CER has a longer la- 
tency than the instrumental re- 
sponse, and on most trials the in- 
strumental response occurs before the 
CER can begin. The S is removed 
from the conditioned stimulus before 
the CER can be evoked. On some 
trials, however, the latency of the 
instrumental response is sufficiently 
long for the CER to occur, and thus 
the instrumental response serves to 
reduce the CER, and the instru- 
mental response is reinforced. Be- 
cause this type of reinforcement oc- 
curs on only some of the trials, the 
instrumental response is actually 
partially reinforced during extinc- 
tion. 

Intertrial interval. Sheffield (1949) 
reported evidence to indicate that 
the PRE could be obtained if the 
acquisition interval was massed (15 
sec.) but not if it was spaced (15 
min.), and gave an “aftereffects” 
interpretation of her results. Since 
her conclusion is based on a pre- 
sumed interaction between percen- 
tage of reinforcement and acquisition 
interval, the analysis of variance was 
the appropriate statistical technique. 
Her analysis by means of ¢ was, how- 
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ever, suggestive of her conclusion. 

Two attempts to replicate the 
Sheffield study have been reported by 
Wilson, Weiss, and Amsel (1955). 
They tried to manipulate the 
strength of the aftereffects of re- 
warded trials by using dry food in 
one experiment and water in the 
other; Sheffield had used wet mash. 
The results obtained by Wilson, 
Weiss, and Amsel were in sharp dis- 
agreement with those of Sheffield. 
The 50% reinforcement conditions 
led to greater resistance to extinction 
independent of the acquisition in- 
terval. There was no significant in- 
terval by reinforcement interaction 
either by analysis of variance or anal- 
ysis of covariance. Lewis (1956) also 
replicated Sheffield’s experiment ex- 
cept that he used a 2 min. distributed 
interval. He obtained essentially the 
same results as those of Wilson, Weiss 
and Amsel. Again the PRE was ob- 
tained whether or not the acquisition 
trials were distributed or massed, 

With an eyelid conditioning ap- 
paratus, Grant, Schipper, and Ross 
(1952) used a 2X22 factorial de- 
sign similar to that of Sheffield except 
that the intertrial intervals were 10 
sec. or 40 sec. The extinction results 
were quite complex in that there was 
a triple interaction among the three 
variables of the experiment. The 
authors interpreted this interaction 
to mean that the superiority of the 
50% groups during extinction was in- 
creased when there was a change in 
the distribution of trials from 
tion to extinction. 

The Sheffield design was also used 
by Grant, Hornseth, and Hake (1950) 
in a light-guessing experiment using 
a Humphreys’ board (Humphreys, 
1939). The intertrial intervals were 5 
sec. and 45 sec. Again the PRE was 
obtained independent of the acquisi- 
tion interval. Tyler (1956) in a dis- 


acquisi- 


crimination situation found the PRE 
after a 15 min. acquisition interval, 
and Weinstock (1954, 1958) in two 
studies found the PRE after a 24-hr. 
interval between acquisition trials. 
The only study not showing the PRE 
after spaced acquisition trials was 
that of Rubin (1953), but it is im- 
possible to separate the effects of the 
intervals from possible other effects 
such as those of secondary reinforce- 
ment, and little can be concluded 
from this study. 

The conclusion seems to be rather 
firm that the PRE obtains whatever 
the spacing of the acquisition trials. 

Drive. There are only two papers 
directly relevant to the effects of 
different drive levels on partial rein- 
forcement. One by Lewis and Cotton 
(1957) is essentially negative. Using 
1, 6, and 22 hr. of food deprivation 
with 50 and 100% reinforcement for 
each drive level, they found that the 
interaction between the two variables 
did not attain significance. 

The other study relevant to this 
variable is that of Linton and Miller 
(1951). Two groups of hungry rats 
received acquisition training with 
either 100% or 50% reinforcement. 
Each of these two groups was sub- 
divided into two groups during ex- 
tinction, one of each pair being ex- 
tinguished while satiated, the other 
under the acquisition drive, Al- 
though appropriate for a factorial 
analysis of variance, the data were 
analyzed by ¢ tests. The results in- 
dicated that the group partially rein- 
forced and extinguished under the 
same drive as used during acquisition 
was more resistant to extinction than 
the other three, which were not dif- 
ferent from each other. In part, the 
results were interpreted in terms of 
drive-stimulus generalization. An- 
other part of the explanation had to 
do with the absence of a frustration 
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drive for the partial-satiation group. 
It was believed that the absence of 
the frustration drive for the satiation 
group prevented the PRE from ap- 
pearing. 

Because a motivational interpreta- 
tion of the PRE is not uncommon 
(Amsel, 1958) more studies are 
needed combining drive with dif- 
ferent schedules of reinforcement. 

Discrimination. In one sense, all 
studies using a choice situation and 
the noncorrection method involve 
partial reinforcement, because an in- 
correct trial is never reinforced. As 
the number of correct trials increases, 
so does the percentage of reinforce- 


-ment. Lewis and Cotton (1958) have 


pointed out the complexity of the 
T maze as far as partial reinforce- 
ment is concerned. The two arms and 
the stem of the T maze each have a 
different percentage of reward obtain- 
ing. The correct arm is rewarded 
100% of the time. The incorrect arm 
is rewarded 0%, and stem is rewarded 
at varying percentages, low during 
the early trials and almost 100% as 
the response is learned. Discrimina- 
tion situations have thus afforded a 
difficult arena for the study of partial 
reinforcement. But because choice 
situations are so commonly used and 
because partial reinforcement is such 
an effective variable, studies are 
needed to tease out the effect of the 
different percentages of reward in 
the different sections of the appara- 
tus. At a minimum, extinction run- 
ning speeds should be determined 
separately for the stem and two arms 
of the T maze. It might also be de- 
sirable to duplicate in a runway the 
pattern of reinforcement usually ob- 
tained in the stem of a T. : 
Some information about partial re- 
inforcement is available from re- 
sponse reversal studies, even though 
response reversal does not qualify as 


an orthodox extinction operation. 

Wike (1953) showed that a re- 
sponse acquired under conditions of 
partial reinforcement is more resist- 
ant to reversal training than a con- 
tinuously reinforced response. This 
finding has been confirmed by Gross- 
light and Radlow (1955), Grosslight, 
Hall, and Scott (1954), and Kendler 
and Lockman (1958), but not by 
Buss (1952). 

Babb (1956) has shown that an 
irrelevant stimulus associated with 
reward 70% of the time facilitates 
later performance when the irrelevant 
stimulus becomes relevant. When 
the irrelevant stimulus is associated 
with reward 50% of the time, it has 
no effect on later learning. When 
associated with reward 30% of the 
time, later learning is inhibited by 
this previously irrelevant stimulus. 

In general, partial reinforcement 
retards later response reversal. 

Generalization and complex stimuli. 
In a study investigating the gradient 
of primary stimulus generalization to 
tones, Wickens, Schroder, and Snide 
(1954) established conditioned gal- 
vanic reactions to one of three tones, 
each of which was separated by 25 
jnd’s. For one group, in addition to 
the tones, a click was randomly pre- 
sented 12 times without reinforce- 
ment. 

During test trials a gradient of 
generalization was found for the 
group without clicks, but no gradient 
was discovered in the group with 
clicks. The failure to obtain a 
gradient for the click group seemed 
attributable to a generally high re- 
sistance to extinction. 

Other evidence is available (Brown 
1947) to indicate that the generaliza- 
tion gradient is very broad on the 
first test trials, becoming sharper 
with further test trials. Because the 
test trials are usually nonreinforced, 
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generalization, and of course dis- 
crimination, is at least a by-product 
of extinction. A variable which in- 
creases resistance to extinction 
should, then, decrease any generaliza- 
tion gradient. From this, one would 
conclude that partial reinforcement 
should increase the amount of gen- 
eralization, which is indicated by the 
Wickens, Schroder, and Snide (1954) 
study. Further evidence on this im- 
portant problem is needed from other 
experimental situations. 

The experimenters later reasoned 
that “if a complex stimulus is pre- 
sented along with reinforcement and 
an element of the complex is pre- 
sented at some other time without 
reinforcement, then this latter action 
increases the resistance to extinction 
of the complex stimulus itself” 
(Wickens & Snide, 1955, p. 257). 

An experiment (Wickens & Snide, 
1955) was undertaken to test this 
hypothesis directly. The general 
procedure consisted of employing as 
the CS a light and a tone which came 
on simultaneously. Two groups were 
conditioned and extinguished to this 
stimulus complex. For the experi- 
mental group one or the other aspect 
(the light alone or tone alone) of the 
total complex was occasionally pre- 
sented during the training without 
reinforcement. According to the 
hypothesis, the experimental group 
should show greater resistance to ex- 
tinction than the control. This 
hypothesis was Supported by the 
data. 

Magnitude of reward. Lewis and 
Duncan (1956b) combined different 
magnitudes of reward with partial re- 
inforcement using a modified slot 
machine. Their rewards were 1; 10, 
25, and 50 cents for each rewarded 
acquisition play. There was no 
significant interaction between 
amount of reward and the five dif- 


ferent percentages of reward used in 
this situation. Hulse (1958), how- 
ever, with an enclosed alley and 
magnitudes of .08 gm and 1.0 gm 
pellets, found a larger PRE with the 
larger sized pellet. The two experi- 
ments differ so greatly that it is al- 
most futile to guess at the crucial 
difference. One possibility, however, 
lies in the intertrial interval which 
Hulse believes important for reward- 
percentage interactions. Hulse used 
a 24-hr. interval and Lewis and Dun- 
can used a massed trial situation. 
Also, there may again be a difference 
between a performance situation like 
Lewis and Duncan's and essentially 
a learning one like Hulse’s. 

Confinement. In addition to magni- 
tude Hulse also varied the length of 
the goal box confinement. He found 
that the PRE was less if there was a 
change in confinement from acquisi- 
tion to extinction. Different extinc- 
tion confinements had no effect, 
confirming a previous study by 
Hulse and Stanley (1956). Acquisi- 
tion confinements were not effective 
by themselves either, 

Variable delay of reward. Crum, 
Brown, and Bitterman (1951), in 
testing one of the theoretical notions 
of Sheffield (1949), introduced a new 
and interesting variable into the 
partial reinforcement literature. For 
one group of Ss, reward was given 
immediately. For a second group, 
reward was given immediately on 
half of the trials and it was delayed 
for 30 sec. on the other half. The 
trials on which reward was delayed 
Were irregularly interspersed among 
the others. They found the typical 
PRE for the variable delayed group. 
Confirmation was soon made avail- 
able by Scott and Wike (1956). 
Logan, Beier, and Kincaid (1956) 
showed that the PRE occurred when 
delays were variably 0 and 30 sec., 
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but not when they were 0 and 9 sec. 
Kintsch and Wike (1957) used a T 
maze and gave one group immediate 
reward on the correct trials, another 
group 10-sec. delays on half of the 
correct trials, and a third group 30 
sec. delays on half of the correct 
trials. The results were essentially 
the same as those of Logan, Beier, 
and Kincaid (1956); and PRE ap- 
peared with 30 sec. variable delay, 
but not with 10seconds. Feher (1956), 
however, had found increased resist- 
ance to extinction with constant ac- 
quisition delays of 10 seconds. 

Wike and McNamara (1957) used 
an alley runway, and varied per- 
centage of delay. One group was de- 
layed on 25% of the trials, a second 
group was delayed on 50% of the 
trials, and a third group was de- 
layed on 75% of the trials. All delays 
were 30 seconds. On the other trials 
Ss received reward immediately. 
The 25% group extinguished faster 
than the other two, which were not 
significantly different. Apparently, 
reward must be delayed on more than 
25% of the trials for the PRE to oc- 
cur. It would be interesting to know 
the shape of the function between 
25% and 50%. 

Peterson (1956) combined partial 
delay with partial reinforcement in 
an interesting experiment. One 
group was reinforced on every trial 
with no delay. Another also had 
100% reinforcement with delays on 
various trials of 0, 10, 20, and 30 sec. 
A third group had 50% reinforce- 
ment with delays of 0, 10, 20, and 
30 sec. on the reinforced trials. A 
final group had 50% reinforcement 
with no delay on the reinforced trials. 
The 50% group with delay was most 
resistant to extinction. The 100% 
group with delay and the 50% group 
without delay were about equal and 
next most resistant to extinction. 


Least resistant to extinction was the 
100% no delay group. 

In conclusion, it seems that partial 
delays of 30 sec. result in increased 
resistance to extinction, but partial 
delays of 10 sec. or less do not. The 
delays must occur on more than 25% 
of the trials, and when partial rein- 
forcement is combined with partial 
delay, resistance to extinction is the 
greatest. 

Situations. Having established an 
empirical phenomenon in one experi- 
mental situation we take some prac- 
tical, and sometimes theoretical, in- 
terest in determining the variety of 
situations in which the phenomenon 
holds. In this section an attempt will 
be made to review quickly some of 
the different experimental situations 
in which the PRE has been explored. 

Lewis (1952) used children in a 
button pushing situation. The re- 
wards were plastic cowboys and, in 
a 10-trial acquisition series, were 
given 100%, 60%, 50%, and 0% of 
the time. There was no difference in 
PRE between the 50% and 60% 
groups, nor between the 100% and 
0% groups, but the former two 
groups were superior to the latter 
two. Fattu, Mech, and Auble (1955) 
with a similar situation measured the 
number of responses to extinction 
after 100%, 50%, and 25% rein- 
forcement. The 25% group was con- 
siderably the most resistant to ex- 
tinction, followed by the 50% and 
100% groups in that order. Fattu, 
Auble, and Mech (1955) were not 
able to repeat their results although 
the same trend was apparent. 

Goss and Rabaiola (1952) had Ss 
learn one of three nonsense syllables 
to a color stimulus. The correct re- 
sponse was reinforced with a buzzer 
100%, 75%, and 50% of the time 
for the different groups. The speed 
of attainment of a criterion of learn- 
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ing was an increasing function of the 
percentage of reward. After the cri- 
terion was reached the reward condi- 
tions were switched. Half of each 
acquisition group was given 0% re- 
ward, and the reward percentage was 
halved for the other. The mean 
number of correct responses de- 
creased faster for the 50% and 0% 
group than for the 100% to 0% 
group. This runs counter to the usual 
PRE. Also, Goss and Rabaiola re- 
port an increase in correct responses 
after about eight trials of no reward. 

Hirsch (1957) presented his Ss 
with a series of words to which they 
were to respond with a number. The 
correct number was one less than the 
number of letters in the words. He 
reinforced his Ss either 100% or 67% 
of the time for the correct responses. 
All reinforcement was omitted after 
a criterion was attained. Again there 
was no significant PRE. Perhaps, if 
more of a “persistence” measure of 
extinction had been used in these two 
studies, the typical PRE would have 
occurred. 

Kanfer (1954) and Spivok and 
Papajohn (1957) varied percentage 
of reinforcement in an autokinetic 
situation. A restricted class of S’s 
verbal statements about movement 
was selected for reinforcement. Kan- 
fer rewarded these responses at 100% 
67%, 50%, and 0% for different 
groups, bringing all but the 0% group 
to the same criterion of performance. 
The reward was removed. The 
typical PRE was found in that the 
greater number of “critical” re- 
sponses was found in the partial 
group. The S’s awareness of the re- 
inforcement was not an essential 
condition for the PRE. Kanfer also 
pointed out that the rate of extinc- 
tion may be as important as the total 
number of responses emitted. Spivok 
and Papajohn (1957) compared 100% 


reinforcement with a variable inter- 
val schedule, finding greater PRE for 
the latter. 

Finally, Lewis and Cotton (1958) 
found the PRE in a T maze with a 
correct turn response measure. Their 
animals had partial reinforcement 
administered by being placed in the 
goal by hand with food present on 
50% of the placements. However, 
both the 50% and 100% groups were 
worse than a control group that did 
not receive nonresponse reinforce- 
ment. 


THEORY 


In 1950, discussing the various 
theories of partial reinforcement, 
Jenkins and Stanley (1950) ennu- 
merated five essentially different 
theories. These were: (a) response 
unit, (b) aftereffects, (c) discrimina- 
tion, (d) secondary reinforcement, 
and (e) expectancy. All of these 
remain today as viable entries in the 
partial reinforcement sweepstakes al- 
though the response-unit hypothesis 
is rarely used for discrete trial phe- 
nomena, and the expectancy hypo- 
thesis has been much battered of 
late and seems to have no articulate 
supporter remaining. In addition, 
there are two new entries since 1950. 
These are: (a) a competing response 
theory of Weinstock (1954), and one 
by Hulse and Stanley (1956), and 
(b) a mediating response theory, es- 
poused by Amsel (1958), Kendler, 
Pliskoff, and D'Amato (1957) and 
Logan, Beier, and Kincaid (1956) 
among others. 

Aftereffects. One of the first, and 
probably still as good as any, state- 
ments of the aftereffects theory is that 
of Sheffield (1949). She pointed out 
that the aftereffects of reinforcement 
are quite different from the after- 
effects of nonreinforcement. When 
reinforcement occurs, S would have a 
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food taste in the mouth, perhaps food 
particles, and still other stimuli asso- 
ciated with eating. After a nonrein- 
forced trial the aftereffects would in- 
clude frustration, searching, etc. Ob- 
viously the stimuli following nonrein- 
forcement would be very different 
from those following reinforcement. 
If the stimulus aftereffects of non- 
reinforcement are still present on the 
next trial, and the next trial results 
in reinforcement, then the instru- 
mental running response would be 
conditioned to the aftereffects of non- 
reinforcement, and the S would ac- 
tually learn to respond to the stimuli 
of nonreinforcement. Since the ex- 
tinction stimuli are those of nonrein- 
forcement, S has learned to respond 
during extinction. Those Ss who are 
rewarded on every trial never have 
the opportunity to respond to non- 
reinforcement cues and thus have 
not learned to respond to extinction 
stimuli. 

But this is only the learning factor 
of the aftereffects theory as Sheffield 
describes it. There is also a primary 
generalization factor. For the con- 
tinuously reinforced Ss, the advent 
of extinction introduces new stimuli 
for the first time, those from non- 
reinforcement. With the introduc- 
tion of new stimuli there will be a 
response decrement due to primary 
stimulus generalization. For the 
partially reinforced Ss there would 
be no new stimuli introduced with 
extinction, since nonreinforcement 
had occurred repeatedly during the 
acquisition. 

To demonstrate the adequacy of 
her reasoning, Sheffield ran 50% and 
100% groups, factorially combined 
with massed and distributed acquisi- 
tion trials. With massed trials the 
aftereffects of nonreinforced trials 
should still be present at the initia- 
tion of the succeeding trial, as the 


theory requires. With distributed 
trials this would not be the case. Her 
results probably confirmed her theory 
for the PRE seemed to appear, as 
tested by #, only after massed trials 
and 50% reinforcement. 

Following some implications of 
Sheffield’s hypothesis, Grosslight and 
Radlow (1955) and Grosslight, Hall, 
and Murnin (1953) found that sev- 
eral series of trials in which a single 
nonreinforcement is followed by rein- 
forcement would result in the PRE, if 
the nonreinforcement were followed 
by a reinforcement. The authors in- 
terpret their results in terms of a 
Sheffieldian aftereffects theory, in 
that the aftereffects of the nonrein- 
forced trials are presumed to be con- 
ditioned to the instrumental response 
because they are followed by a rein- 
forcement, but it should be noted 
that a discrimination hypothesis 
would handle the results equally well, 

Linton and Miller (1951) found no 
PRE when extinction, after partial 
reinforcement under normal depriva- 
tion conditions, was carried out under 
conditions of drive satiation and 
with the reward present on each 
trial. They reasoned that this was 
because there were very different 
aftereffects occurring during satiated 
“extinction” and that the instru- 
mental response had not been con- 
ditioned during acquisition to these 
aftereffects. 

As noted in the section on inter- 
trial interval, Wilson, Weiss, and 
Amsel (1955) repeated Sheffield’s 
study twice, with minor variations 
and without her results. In one case 
the variation was such as to enhance 
the aftereffects. In the other, the 
variation served to reduce the after- 
effects, but in neither case was the 
massed acquisition-partially rein- 
forced group more resistant to extinc- 
tion. Lewis (1956) also repeate 
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Sheffield’s study using a 2-min., in- 
stead of a 15-min. spaced interval. 
The graph of his results appeared to 
be Sheffieldian, but the appropriate 
statistical analysis showed no signifi- 
cant effect. 

Continuing the attack on the after- 
effects hypotheses Tyler (1956) found 
the PRE in a discrimination situa- 
tion even after a 15-min, acquisition 
interval and, even more devastating, 
Weinstock, on two occasions (1954, 
1958) has found the PRE even when 
acquisition trials were spaced 24 
hours apart. This suggests that long 
range aftereffects are more important 
than previously believed. 

Tyler, Wortz, and Bitterman 
(1953) reasoned that if Sheffield were 
right, a simple pattern—alternating 
reinforcement with nonreinforce- 
ment—should give a greater PRE 
than a random pattern, because al- 
ternation maximizes the number of 
times nonreinforcement follows rein- 
forcement. Their results showed just 
the opposite; alternation resulted in 
quicker extinction than did the ran- 
dom pattern. 

Another ingenious attack on the 
aftereffects theory was performed by 
Crum, Brown, and Bitterman (1951). 
If the reward were delayed after each 
trial, they argued, the aftereffects for 
the succeeding trial would always be 
those of reinforcement, They com- 
pared a group which received delay of 
reward on some of its trials and im- 
mediate reward on others with a 
group that received immediate re- 
ward on all trials. The Partial delay 
group showed the PRE very nicely, 
Peterson (1956), Logan, Beier, and 
Kincaid (1956), and Wike (1953), 
have confirmed the results of Crum, 
Brown, and Bitterman. 

Feher (1956) performed some in- 
teresting variations on the delay of 
reward. She compared delays given 


before the reward with delays given 
after the reward. For the latter, Ss 
were simply left in the goal box for a 
specified period of time. She found 
that both delay groups extinguished 
slower than groups without delay. 
And the delay-before-reward group 
extinguished slower than the delay- 
after-reward group, again counter to 
an aftereffects notion. 

Katz (1957) ran three groups of 
animals in a test of the aftereffects 
notion. The acquisition trials were 
given in two quite different runways. 
One trial was given on the first run- 
way, followed immediately by a trial 
on the second runway. One group 
(P-C) was given partial reinforce- 
ment on the first runway and con- 
tinuous reinforcement on the second. 
A second group (C-P) was given just 
the reverse: continuous reinforcement 
on the first and partial on the second. 
The final group (C-C) was given con- 
tinuous reinforcement on both run- 
ways. For the P-C group, all non- 
reinforced trials occurring in the first 
runway were followed by reinforced 
trials in the second runway. For the 
C-P group there was no similar op- 
portunity for conditioning the after- 
effects of nonreinforcement. All ex- 
tinction trials were given in the sec- 
ond runway. The results showed that 
the C-P was slowest to extinguish, 
followed by the P-C group, and the 
C-C group extinguished most ra- 
pidly. This would indicate that other 
factors than aftereffects are most 
importantin bringing about the PRE. 

The only conclusion that can be 
drawn from these experiments is 
that, at best, Sheffieldian aftereffects 
are not important contributors to the 
PRE, and they probably have no 
effect whatsoever. 

Discrimination. The discrimina- 
tion theory, which was apparently 
first advanced by Mowrer and Jones 
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(1945), has had the advantage of be- 
ing less specifically stated than the 
aftereffects theory, and the longevity 
of a theory is apparently inversely 
related to the specificity with which 
it can be stated and tested. In general 
the discrimination theory states that 
resistance to extinction is a function 
of the similarity of the acquisition 
stimuli to the extinction stimuli, The 
more similar the stimulus conditions 
are in the two situations, the greater 
the resistance to extinction. The 
problem then becomes one of stating 
and demonstrating the variables of 
which similarity may be said to be a 
function. Once lawful relationships 
are obtained between similarity vari- 
ables and behavior, any controversy 
over whether discrimination theory 
is a perceptual one, or refers to some- 
thing going on in the rat’s mind, is 
superfluous. The major task for dis- 
crimination theorists, then, is one of 
stating specifically what variables de- 
termine similarity. 

Probably the most vigorous sup- 
porters of the discrimination hypoth- 
esis have been the Texas group, al- 
though their research has been aimed 
more at disproving the aftereffects 
theory than it has at giving support 
to a discrimination theory. Longe- 
necker, Krauskopf, and Bitterman 
(1952) and Tyler, Wortz, and Bitter- 
man (1953) showed that a simple al- 
ternating pattern of reinforcement 
and nonreinforcement resulted in 
quicker extinction than a random 
pattern. Since this was contrary to 
an aftereffects theory, they argued, 
there must be come serial patterning 
that occurs, enabling Ss to discrim- 
inate the acquisition series and to 
stop responding quickly when it 
ceases, 

Bitterman, Fedderson, and Tyler 
(1953) and Elam, Tyler, and Bitter- 
man (1954) gave rewards in one end 


box and nonrewards in a very dif- 
ferent one. One group was extin- 
guished with the rewarded end box 
present and another group was ex- 
tinguished with the nonrewarded 
box. Presumably, the stimulus situa- 
tions from acquisition to extinction 
would be more similar when the non- 
rewarded end box was present be- 
cause these stimuli had always in the 
past accompanied nonrewards. Us- 
ing the rewarded end box during ex- 
tinction would bring about a stimulus 
change because these stimuli had al- 
ways accompanied rewards. The pre- 
dicted results occurred, and were in- 
terpreted according to the discrim- 
ination theory. The authors tended 
to speak as if the discrimination were 
perceptual in some fashion, but, of 
course, all they could do was to relate 
behavior to stimuli, in this case 
stimulus change. There is no need to 
introduce an intervening perceptual 
process to handle this data. 
Monkeys were given their choice, 
by Elam and Tyler (1958), between 
two stimuli, A and B. For one group, 
the A stimulus was rewarded 60% 
of the time and the B stimulus was 
rewarded 40% of the time. For the 
second group, the A stimulus was 
also rewarded 60% of the time but 
the B stimulus was rewarded 0% of 
the time. During extinction, the A 
stimulus was rewarded 0% of the 
time and the B stimulus was re- 
warded 100% of the time for both 
groups. Results showed that during 
acquisition, A was much more pre- 
ferred by the second group, but dur- 
ing extinction, A extinguished much 
more rapidly for the second group. 
Elam and Tyler attribute the much 
faster extinction of A for the second 
group to the greater stimulus change 
from acquisition to extinction OC- 
curring for this group than the other. 
Goodnow and Pettigrew (1956) 
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presented a somewhat similar prob- 
lem to four groups of human Ss. The 
apparatus was a “‘two-armed bandit” 
One group received reward for a LR 
(left-right) pattern of responses, the 
second received reward for an LL pat- 
tern, a third was given an LLR pat- 
tern, and the fourth was given a ran- 
dom pattern. After an acquisition 
series of trials, all groups were given a 
random pattern for a while and then 
all groups were switched to a final 
LR pattern. Both the initial LR and 
LL groups learned the final pattern 
faster than the random group. 
(Many of the LLR group did not 
learn the initial pattern.) Goodnow 
and Pettigrew concluded that the 
final pattern was made more dis- 
criminable because of the stability of 
the initial response pattern. Also, 
there was less of a stimulus change 
for the two initially systematic 
groups than for the initially random 
group. 

McClelland and McGown (1953) 
gave one group of animals irregular 
reinforcement in various parts of a 
circular shaped apparatus. Another 
group always found the reinforce- 
ment in the same place. Results 
showed that the inconsistently re- 
warded group was more resistant to 
extinction. The authors concluded: 

Omission of reinforcement changes 
the cue pattern less if the original 
cue pattern is complex or variable, 
than if it is simple and invariant.” 

Sheffield and Temmer (1950) have 
shown that the superior resistance to 
extinction of avoidance learning to 
escape learning can be given a partial 
reinforcement interpretation. Jones 
(1953) found that an “intermittent 
escape’ group, one that was not 
punished on every trial but still could 
not learn to avoid, was more resistant 
to extinction than an orthodox escape 
group. Also a “limited avoidance” 


group extinguished slower than the 
escape group. ‘Limited avoidance” 
referred to a condition in which the 
S was placed on an initially un- 
charged grid, but the current would 
be turned on before S could get off 
the grid. The faster S moved the less 
shock he would take, and in this 
sense it was an avoidance condition, 
but even so, S would always get the 
punishment. The “limited avoid- 
ance” group was also more resistant 
to extinction than the pure escape 
group. The acquisition conditions, 
according to Jones, were more like 
the extinction conditions for the ‘‘in- 
termittent escape” and the “limited 
avoidance” conditions than they 
were for the escape conditions. 

In several studies Lewis and Dun- 
can (1956a, 1956b, 1957, 1958a, 
1958b) found that 0% reinforce- 
ment showed more resistance to ex- 
tinction than 100% reinforcement, 
and more than most other percent- 
ages of reinforcement. Their situa- 
tion, a “one-armed bandit,” differed 
from most others in that no learning 
was involved in the performance of 
the lever pulling response. All Ss, on 
coming into the experimental room, 
already knew how to pull a lever. 
The experimenters argued, therefore, 
that the acquisition series served 
primarily to set the extinction series 
apart, and that 0% acquisition was 
most like the extinction series. In one 
of these studies (1958a) they found, 
contrary to most previous results, 


that extinction was quicker after a ' 


long acquisition series. Again they 
attributed this to the pure perform- 
ance situation in which a relatively 
long acquisition series would serve to 
make the initial stimulus situation 
more stable. 

Somewhat similar results have 


been reported by Capaldi (1957, 
1958). 
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; Brand, Woods, and Sakoda (1957) 
in a two alternative situation varied 
both the difference in percentage of 
reinforcement and the difference in 
the ratio of reinforcement. The dif- 
ference, for example, between 57% 
and 25% is 50%, and the ratio is 
three to one. The difference between 
100% and 50% is also 50%, but now 
the ratio is two to one. They found 
that extinction is quicker when the 
percentage difference is great (a find- 
ing in a line with a discrimination 
hypothesis), but that the ratios had 
practically no effect. 

Fehrer (1956), who found that de- 
lay of reinforcement leads to in- 
creased resistance to extinction, also 
interprets her study according to a 
discrimination theory in that having 
a delay come before a reward affords 
an experience—that of no reward— 
much like the initial experience dur- 
ing extinction. 

All of the studies that have been 
reviewed in this section so far present 
evidence that has been interpreted as 
favorable to discrimination theory. 
There are two recent studies, how- 
ever, that present a considerable 
hurdle for discrimination theory to 
jump. Marx (1958) gave two groups 
of animals acquisition training in a 
runway with food pellets presented 
in the end box in a glass cup. During 
extinction one group was presented 
with the empty cup on every trial, 
and one group was presented with the 
empty cup on only half of the trials; 
on the other half the food cup was 
absent. There was a greater stimulus 
change for the group that had the 
cup on only half of the trials, yet this 
group extinguished slowest. In a 
more elaborate experiment, Brown 
and Bass (1958) varied the stimulus 
conditions—different alleys—in both 
acquisition and extinction. They 
found that variable extinction condi- 


tions resulted in increased resistance 
to extinction. In both of these experi- 
ments, stimulus change increased re- 
sistance to extinction instead of de- 
creasing it as the discrimination hy- 
pothesis would seem to demand, but, 
it should be noted, only 100% rein- 
forcement was used in both studies. 
In addition, Brown and Bass found 
that irregular stimulus conditions 
during acquisition had no effect on 
resistance to extinction. This seems 
to run counter to previous evidence 
(MacKintosh, 1955), but Brown and 
Bass have the better designed study 
in that theirs was a factorial com- 
bination of variables in both acquisi- 
tion and extinction. MacKintosh 
extinguished all Ss under constant 
conditions, making it impossible, as 
Brown and Bass point out, to dis- 
tinguish between learning condition 
irregularities and the change from ir- 
regular acquisition to constant ex- 
tinction. Both the Brown and Bass 
and the Marx study are consistent 
with much previous evidence (e.g. 
Glazer, 1958) indicating that stimu- 
lus novelty and change increases per- 
formance. It seems, then, that one 
must at least conclude that stimulus 
change during extinction must be 
weighed more heavily in increasing 
resistance to extinction than con- 
stancy of stimulus conditions from 
acquisition to extinction. In any 
case, the studies of Brown and Bass, 
(1958) and Marx (1958) complicate 
the task of discrimination theorists. 
Secondary reinforcement. Denny 
(1946) is usually given credit for the 
formal introduction of the secondary 
reinforcement hypothesis into the 
area of partial reinforcement. He 
pointed out that on reinforcement 
trials the stimuli of the goal box are 
associated with primary reinforce- 
ment and therefore should acquire 
secondary reinforcing power. 
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nonreinforced trials and during ex- 
tinction, secondary reinforcement 
should be taking place which would 
retard extinction. 

The attack on the secondary rein- 
forcement hypothesis began with 
Bitterman, Feddersen, and Tyler 
(1953) and Elam, Tyler, and Bitter- 
man (1954). They found that rats 
extinguished in the goal box in which 
they had received their reinforced 
trials—the nonreinforced trials were 
given in a very different goal box— 
showed less PRE than those extin- 
guished in the goal box which was 
present for nonreinforced trials. Ac- 
cording to a Hullian (1943) secondary 
reinforcement hypothesis, the results 
should have been just the opposite. 
The stimuli associated with primary 
reward and present during extinction 
should have prolonged extinction 
performance. In only slightly differ- 
ent situations Notterman (1951) and 
Freides (1957) have duplicated the 
significant aspects of these studies. 
They found no difference between the 
two relevant groups, but this stil] 
does not support a secondary rein- 
forcement interpretation. 

Mason (1957) has reported an in- 
teresting experiment. His animals 
learned two discriminations. In one, 
the positive stimulus was rewarded 
100% of the time; in the other it was 
rewarded 50% of the time. On the 
test trials both positive stimuli were 
presented and the S's preference on 
successive trials constituted the re- 
sponse measure. Most Ss chose the 
stimulus that had been rewarded 
100% of the time, even when twice 
as many total trials had been given to 
the 50% rewarded stimuli, so that the 
total number of primary rewards in 
the two situations was the same, but 
the 50% group would have had sec- 
ondary reinforcement in addition. 
These results were interpreted as 


opposed to a secondary reinforcement 
hypothesis. 

The results of Fehrer (1956) can 
also be interpreted as opposed to a 
secondary reinforcement interpreta- 
tion of partial reinforcement. Two 
of her groups involved goal box de- 
lays. One was given a 30-sec. delay 
before a 10-sec. eating period, and 
one was given a 10-sec. eating period 
followed by a 30-sec. delay. Another 
group was allowed to eat for a full 
40-sec. in the goal box. The goal box 
stimili had a longer association with 
primary reward for the latter group 
and should have resulted in slower 
extinction than the other two. Just 
the opposite happened; both delay 
groups extinguished slower than the 
nondelay group. 

Hulse and Stanley (1956) re- 
ported, but only at the .10 level of 
significance, that the PRE occurred 
only when the S' was present on 
every training trial. i 

The evidence, at this point, seems 
quite conclusive that secondary rein- 
forcement is not the sole explanation 
of the PRE, but that it can be ex- 
ceptionally efficacious was indicated 
by Zimmerman (1957), who used a 
free responding situation. He sug- 
gested that when an S? is irregularly 
associated with reinforcement, it 
loses none of its S” function and 
maintains its S function over a con- 
siderable time. It is most effective as 
an S when presented irregularly 
again. Zimmerman’s study is based 
on very few animals, and needs to be 
replicated before it can be accepted 
with absolute confiidence, but it 
seems probable that a stimulus, when 
irregularly associated with reinforce- 
ment and then used irregularly as an 
S, can have very powerful reinforc- 
ing value. At this point, however, it 
becomes difficult to distinguish be- 
tween a discrimination hypothesis 


PARTIALZ@RE. 


XN 


+ = 


FWronsiicdn’ 21 


ay 


and a secondary reinforcement one. that 30-sec. delays, both before and 


It seems likely, in fact, that they are 
basically the same. 

Competing response. Weinstock 
(1954) has presented an ‘“habitua- 
tion” theory which states essentially 
that nonreinforced responses ‘“‘ha- 
bituate” by some unspecified process 
and drop out. For a group of par- 
tially rewarded Ss, competing re- 
sponses are made during acquisition 
on the nonreinforced trials, and 
these competing responses drop out. 
Thus when extinction begins, the 
competing responses have already 
habituated and the instrumental re- 
sponse continues strongly. No ha- 
bituation of the competing responses 
occurs during acquisition for the con- 
tinuously reinforced group, and their 
onset during extinction results in a 
rapid decrease in the instrumental 
response. 

The only experiments aimed di- 
rectly at testing this notion—al- 
though Tyler (1956) has pointed 
out that Weinstock’s theory does not 
explain why random reinforcement 
results in a greater PRE than alter- 
nating reinforcement—are those of 
Stanley and Clayton (1955) and 
Hulse (1958). Stanley and Clayton 
assumed that if Ss were delayed in 
the goal box, more competing re- 
sponses would occur and thus habitu- 
ate. They gave immediate reinforce- 
ment and a 30-sec. delay to two 
groups during acquisition and broke 
these two down factorially to im- 
mediate goal box removal and a 30- 
sec. delay during extinction. The 
acquisition delay group was not 
more resistant to extinction than the 
immediately rewarded group. These 
results are opposed to Weinstock’s 
habituation theory, if habituation is 
a function of time in the goal box, 
but they are also opposed to the re- 
sults of Fehrer (1956) who found 


after the reinforcement, retarded 
extinction. 

In another study, Hulse and Stan- 
ley (1956) presented a theory some- 
what similar to that of Weinstock 
(1954). They also argued that com- 
peting responses occur during the 
nonreinforced acquisition trials, but 
for them the competing responses do 
not “habituate.’’ The Ss learn to do 
something other than eat during 
these trials. When extinction starts, 
the partially reinforced Ss do the 
“something else” that they learned 
during acquisition and are quickly 
removed from the stimuli which 
evoke eating, and thus the condi- 
tioned eating response is protected 
from rapid extinction. At this point 
Hulse and Stanley follow the notion 
of Sheffield, Roby, and Campbell 
(1954), that the faster extinction of 
conditioned eating leads to a faster 
loss of the instrumental response. 
This is because the conditioned eat- 
ing response works backward over 
the maze stimuli and constitutes an 
important part of the stimulus situa- 
tion leading to the instrumental re- 
sponse. 

Hulse and Stanley's theory ap- 
pears to be more specific than Wein- 
stock’s because it does not contain 
such a vague term as “habituation” 
playing an important role. Also 
Hulse and Stanley indicate how the 
goal box competing response can af- 
fect behavior occurring at the be- 
ginning of the apparatus. But this 
notion, too, runs into difficulty when 
Freides’ (1957) data are considered. 
He found that behavior in the goal 
box (approaching food) could ex- 
tinguish while a runway response 
remained strong. id 

Mediational responses. It sbon 
be noted that the theory of Hulse an 
Stanley (1956) involved a mediating 
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conditioned eating response and 
could very properly be considered in 
this section also. 

Wilson, Weiss, and Amsel (1955) 
and Amsel (1958) have argued for a 
mediating frustration response as an 
explanation of the PRE. During 
partial reinforcement an emotional 
response develops on the unrein- 
forced trials. The emotional re- 
sponse works backward over the 
maze stimuli and is evoked by the 
stimuli of the start box. The emo- 
tional response has stimulus proper- 
ties, as do all responses, and these 
stimuli become conditioned to the 
instrumental response. Since the 
emotional response does not get 
conditioned during acquisition to 
the instrumental response for the 
100% Ss, they show a sharp re- 
sponse decrement during extinction. 

Essentially the same idea is pre- 
sented by  Kendler, Pliskoff, 
D'Amato, and Katz (1957) 


c » except 
that no special Properties, such as 
emotionality, are ascribed to the 


mediating response. Lewis and Dun- 
can (1958b) also used a mediating 
response to interpret their data and 
show how some language behavior is 
related to it. 

Logan, Beier, and Kincaid (1956) 
maintain that resistance to extinc- 
tion is a direct function of the degree 
to which the mediating response per- 
sists beyond the time at which rein- 
forcement usually occurs. ‘“Postrein- 
forcement time cues” do not occur 
during 100% acquisition because 
reinforcement is given every time as 
soon as S enters the goal box. They 
do, however, occur during extinc- 
tion, evoking the mediating re- 
sponse and quick extinction results, 
With longer delays of reinforcement 
during acquisition such as occur in 
a varied delay procedure, the occur- 
rence of mediating responses to 


postreinforcement time cues will be 
increased because the mediating re- 
sponse is eventually reinforced after 
the delay. Thus the mediating re- 
sponse continues over a longer time 
for the partial delay Ss, and gets 
rewarded for being prolonged. It 
will occur over a longer period dur- 
ing extinction also. Because the 
mediating response lasts longer, and 
is conditioned, presumably, to the 
instrumental response, the instru- 
mental response will also last longer. 
Logan, Beier, and Kincaid (1956) 
are not entirely clear on this point, 
however. 

Expectancy, Expectancy “theory” 
was brought into partial reinforce- 
ment by Humphreys (1939). He 
considered that his early partial rein- 
forcement studies were contrary to a 
drive reduction point of view, but 
that the data could be handled 
nicely by a concept of expectancy. 
Basically, he argued that partial 
reinforcement resulted in an ex- 
pectancy of irregular reinforcement 
and that continuous reinforcement 
resulted in an expectancy of regular 
reinforcement. He further stated 
that it was easier to change from a 
regular expectancy of one kind (that 
rewards occur on every trial) to a 
regular expectancy of another kind 
(that rewards do not occur on any 
trial), than it was to change from an 
irregular expectancy (that rewards 
occur on only some of the trials) to a 
regular expectancy. 

Lewis and Duncan (1957) at- 
tempted to test Humphreys’ notion 
by having their Ss state, before each 
trial, the confidence they held that 
they would win or lose on the next 
trial. They found no evidence in 
their data for Humphrey’s notion 
that those with irregular expectan- 
cies of winning would continue longer 
during extinction than those with 
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regular expectancies. Insofar as 
Humphreys meant some over-all 
kind of expectancy, these data are 
not pertinent since they are trial-by- 
trial expectancies. 

The two-light Humphreys board 
has been a standard apparatus for 
studying expectancies—‘‘yes” and 
“no” statements—for a number of 
years. Recently several studies have 
appeared with this device which have 
used a shift in stimulus presentation 
probabilities. In order to respond 
to the postshift probabilities, pre- 
sumably the responses to the pre- 
shift probabilities must extinguish. 
Thus this situation seems to afford a 
means, at least indirectly, of study- 
ing expectancies during extinction. 

Parducci (1957), with a two- 
choice betting game, gave separate 
groups three preshift probabilities— 
15%, 50%, and 70%. Then all 
groups were shifted to 70%. He 
found more complete adjustments to 
the postshift probability when the 
magnitude of the shift was greatest. 
He interpreted his results according 
to a discrimination theory and coun- 
ter to Humphreys’ regularity of ex- 
pectancy analysis. He also con- 
cluded, although the evidence is not 
very clear on this point, that the ex- 
pected permanence of shift was more 
important than the magnitude of 
shift. 

Goodnow and her colleagues 
(Goodnow, 1955; Goodnow & Petti- 
grew, 1955; Goodnow & Pettigrew, 
1956; Goodnow & Postman, 1955) 
in a number of studies have hy- 
pothesized that guesses would be 
more likely to approach 100% to the 
most probable light if the situation 
were a “chance” one than if it were 
a “problem solving” one. James and 
Rotter (1958) also hypothesized that 
the “chance” and “skill” situations 
would make a difference to resist- 


ance to extinction. They gave one 
set of instructions to indicate that 
the task involved chance only, and 
one set to indicate that the task was 
skill only. Instructions were com- 
bined factorially with 50 and 100% 
reinforcement. Their hypothesis was 
confirmed. With skill instructions, 
the 100% group took longer to ex- 
tinguish than the 50% group. With 
chance instructions, just the reverse 
was true. James and Rotter inter- 
pret their results as the effects of 
symbolic processes of some kind, 
whereas Goodnow and Postman 
(1955), in a slightly different situa- 
tion, maintain that “awareness” is 
not a necessary part of such proba- 
bility discriminations. 

Perhaps every study cited in, this 
report could be reinterpreted ac- 
cording to an expectancy notion, 
and that is the main weakness of 
such a point of view. There seems to 
be no way of disproving it. Cer- 
tainly not much recent research has 
been oriented around expectancies, 
and this concept is apparently mori- 
bund as an explanation of the PRE. 


CONCLUSION 


The writer began this review of 
the literature on partial reinforce- 
ment in the hope of arriving at some 
explanation of the PRE. Unfor- 
tunately, he is at least as far away 
from an explanation now as when he 
started. Conflicts and contradic- 
tions in data contribute their share 
of confusion, but probably more im- 
portant has been the absence of the 
right kind of data. In the data sec- 
tion of this paper an attempt was 
made to discover parametric laws. 
Except in very few instances, this 
was impossible. Not many experi- 
menters seem to be interested in how 
one variable relates to another along 
the major range of both variables. 
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Most experimenters are interested in 
“theory” testing. As a result we 
have a large number of two or three 
group experiments, using a widely 
different array of apparatus, some- 
times combined in factorial designs, 
telling us that our theoretical no- 
tions are largely inadequate, but not 
telling us a great deal more. To de- 
termine a parametric law probably 
at least five points are needed along 
each dimension. With fewer than five 
points, the task of stating the law— 
describing it by an equation—is rela- 
tively trivial. With five or more 
points, and a small number of con- 
stants, the dimension can be de- 
scribed with some precision, assum- 
ing the data are reliable and that a 
dimension is actually present. Cer- 
tainly curve fitting can be an arbi- 


trary procedure, and certainly it is 
not all there is to theory, but, also 
certainly, a theory ought to be about 
something, and parametric data 
make a wonderful subject matter. 
But this is just saying, in a more 
elaborate way, “more research is 
needed.” 

The writer feels no desire at pres- 
ent to carry cudgels for any of the 
“theories” now available, nor does 
he have a theory of his own to con- 
tribute that he has any confidence in, 
nor does he think it sporting to take 
further “pot shots” at the existing 
theories. The writer remains almost 
as empty of an understanding of 
partial reinforcement now as when 
he began to review the literature, but 
he still considers the problems in the 
area to be fascinating. 
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On every hand, one hears the cry 
that a new continent has been dis- 
covered in the realm of visual science. 
Its supposed magnitude is indicated 
by the titles of some of the articles 
written about it—“A Bombshell in 
Color Theory”; “An Astonishing 
New Theory of Color”; “New Light 
on the Eye”; and so on. 

The discoverer is Edwin H. Land, 
originator of Polaroid and the one- 
minute camera, and an undoubted 
inventive genius in the Edison-Bur- 
bank-Kettering tradition. The claims 
and predictions made by him and for 
him are not based upon mere novelties 
of technology, new optical tricks, 
new types of filters or of photo- 
graphic dyes. The underlying dis- 
coveries are supposedly so very basic 
that “a theory of color that has 
stood for nearly 300 years has sud- 
denly been overthrown - - - the eye 
does not need red wave lengths of 
light to see red, does not need orange 
to see orange (etc.)"; for “scientists 
since Newton have been completely 
fooled about the way the eye sees 
color” so that ‘‘every textbook deal- 
ing with color will have to be rewrit- 
ten” (Bello, 1959). Weare told that 
“the recent phenomenon of Edwin 
Land’s essentially 17th-century type 
of discovery in the field of color vision 
_,. really shakes scien- 


Hct vai (Weaver, 1959). 
Land’s findings are said to be 
f all workers 
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in the field of color repro lide pro- 
demonstrations made with ed Piter kave 
jectors and usually only ane © SEEN which 
achieved multicolor eftec okt Kom Ekta- 
sometimes could no Í 7 
chroméss . s: Land's 6° ol cong Sean 

ee method of achieving 


tical, economical one-steP 


full-color prints as simply as black-and-white 
pictures are now made in Polaroid cameras.’ 
The applications, however, could be much 
more widespread and commercially important. 
Besides motion pictures and television, which 
are closely related to the present projection 
picture tests, a perfection of a similar system 
for reflection prints might .. - simplify color 
ink-on-paper systems” (Cros, 1959). 


Again, while Land's discoveries 


are not yet ready to be applied to color photog- 
raphy [still] there seems little question that 
this new knowledge will eventually lead to new 
color-reproduction techniques that will make 
the present methodslook primitiveand clumsy. 
Television especially should benefit from the 
knowledge that the eye can utilize an entirely 
different sort of information from what had 
formerly been thought necessary to produce 


color (Bello, 1959, p. 205). 


It will be noted that none of the 
above quotations is attributed to 
Land. What he himself has said in 
print is more conservative. The whole 
presentation of his material was 
planned for the Proceedings of the 
National Academy of Sciences. Only 
Parts I and II have so far appeared 
(Land: 1959a, 1959b), but they are 
ly dignified. ` His Scientific 
American article (Land, 1959c) is in 
a more popular vein, but Land says 
that wherever it is more excited or 
seems to claim more than his other 
two articles, this represents editorial 
change over which he had no control. 
He did not even know that the paper 
of Cros (1959) was in preparation, 
and he disclaims responsibility for 
the extravagant article of Bello 


(1959). ; 
The story goes that early in 1955 


perfect 


1 Land is developing 
one-step camera, but it makes 1 
color-visual principles Land thin 


jection phenomena hi 
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Land did begin to consider the pos- 
sibility of devising a quick process for 
the Polaroid camera which would yield 
a print in color in a single while-you- 
wait operation. To begin to inform 
himself at firsthand of the problems 
that would be involved, Land set 
up, in three projectors, orthodox 
color-separation transparencies which 
had been prepared photographically 
through conventional red, green, 
and blue gelatine “taking” filters. 
Screened in registry through similar 
filters, the images built up additively 
a full-color reproduction of the orig- 
inal camera scene. 

While trying various filters, beam 

intensities, and so on, Land hap- 
pened to note that when the blue- 
filtered projector was turned off there 
was only a little deterioration of the 
quality of the reproduction on the 
screen. Ina further “accident,” Land 
removed the filter from the green pro- 
jector. The intensified white light 
from it made the picture quite un- 
satisfactory. But Land’s collabora- 
trix, Meroé Morse (who for a decade 
has been Land’s right arm), noticed 
that on the screen there were still 
traces of hues which seemingly should 
not have been there. At the mo- 
ment, this was dismissed as a “‘fa- 
tigue’’ (afterimage) effect. Late that 
night, however, Land returned to the 
setup and dimmed down the white 
projector. “Instantly, the scene 
burst into lifelike color’’ (Bello, 
1959); and since with indefinitely 
long observation the color array per- 
sisted, it was clearly no afterimage 
phenomenon. 

From then on Land’s copy, whether 
still life or animate, was regularly 
photographed through only two 
filters. A positive transparency made 
from a black-and-white negative 
taken through a red filter (passing 
only the longer wave lengths) con- 


stituted a “long record” and was 
ordinarily projected through a red 
filter. A negative simultaneously 
made through a broad-band green 
filter provided a black-and-white, 
positive ‘‘short record” of the copy. 
This was rather a medium record, 
considering what wave lengths and 
energies would have come through a 
blue taking filter. The second pro- 
jector, in which it was placed, usually 
had no filter over its front lens. 

Any object in the original scene 

was now represented on the screen 
by a patch of red light coming 
through a neutral area of a particu- 
lar density in the long record, and by 
a congruent patch of white light sent 
through an identically shaped and 
sized area of the short record, having 
(except by chance) a higher or a 
lower neutral density. The screen 
“object” was a particular ratio-and- 
sum of red- and white-light lumi- 
nances, 
i An elaborate picture made up of 
Just such objects “should” have pre- 
sented them all as reds, pinks, whites, 
and blacks, depending upon the rela- 
tive and absolute densities of the 
corresponding patches of the long- 
and short-record transparency slides: 
mixtures of red and white lights in 
the dark-surrounded 3° field of such 
an instrument as a tricolorimeter 
could not yield orange, yellow, green, 
blue, or purple. Land’s screen images 
did show all these hues and, more- 
over, ın correct hierarchical order”; 
that is, warm-colored objects of the 
copy were rendered by about the 
same warm colors on the screen, and 
cool-colored objects were represented 
by cool colors in the picture. Skin 
looked like skin—in fact more so 
than in many Ektachrome and Koda- 
chrome pictures. 

To Land’s mind, here was a great 
mystery. Where were all the extra 
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colors coming from, beyond those 
his projection lights could mix to 
give, when classical color theory 
could not explain them? By ‘‘classi- 
cal color theory” Land does not mean 
isolated color-vision theory (Young- 
Helmholtzian, Heringian, or other), 
but this together with the relations of 
hues to wave lengths in the white- 
light spectrum, Grassmann's laws of 
color mixture, and particularly the 
fact that to obtain all hues and white, 
three additive primary lights are 
considered to be necessary and suf- 
ficient. 

If the light coming to the eye from 
a patch of screen was seen as having 
a color which a colorimetrist would 
not expect to see in a mixture of 
those two lights at those intensities, 
then classical color theory was inade- 
quate and outmoded. Some entirely 
new theory had to be developed to 
account for all the unexpected colors 
which one obtained by two-light, 
one-color projection on @ screen (an 
perhaps in no other way). 

Quite soon (May 1955) Land gave 
the first of his public lectures wit 
demonstrations. The matter then 
went back into the laboratory, to 
emerge again in 957. In that year 
and the next, Land made presenta- 
tions to various scientific meetings 
and university groups, at Washing- 
ton, Detroit, Rochester, Harvard, 
and Columbia. The order in which 
the phenomena came to be shown 
utlined in Land's first pub- 
where they are 


described as numbered experiments. 


In an orderly way these experi- 
ments bring out that: either the red 
picture or the white picture 1S mono- 
tone or achromatic, put the registry 
of both makes @ tremendous dif- 


i Il color then appears; 
ference since fu ee Peder 


‘4 e e A 
the picture reaPP ood of room light, 


instantly when @ 


which has washed out the screen 
image, is turned off; enough room 
illumination can be permitted to en- 
able the perception of colors apart 
from the screen, and will not inter- 
fere with the colors on the screen; 
changing the subtense of an “object” 
at the observer's eye does not change 
its color; interchanging the long and 
short records turns red hair and lips 
greenish or blue-green, showing that 
the colors are not memory colors nor 
expectation colors; the hues seen are 
not much changed by large altera- 
tions of beam intensity ratios nor by 
doubling the contrasts in the short 
record with superposed duplicates; 
only red, pinks, and white are seen 
if the long-record slide is removed 
from the red-filtered projector—al- 
though many intensity ratios are 


still there; andsoon. : 
Land’s oral presentations 1n East- 


ern cities never came strongly to the 
awareness of us who stay on the West 
Coast. I first learned of his projec- 
tion color phenomena when a col- 
league, who travels much more, 
pressed Part I (Land, 1959a) upon 
me with the solemn assurance that 
this was the most important develop- 
ment in color vision in the past 50 
years. I read it and forgot it, for I 
could not see that anything more 
than “simultaneous color contrast” 
was involved in Land's securing of 
hues from red to blue-green when his 
projection lights were red and green, 
or red and tungsten “white.” Much 
later, Land’s popular article (Land, 
1959c) was forced upon me, when I 
was requested to deal with it as a sec- 
tion editor for a condensation jour- 
nal. By this time Part II (Land, 
1959b) had appeared, as also the arti- 
cles in Fortune, Graphic Arts Monthly, 
and Architectural Forum (Anon. 
1959), which were stirring up great 
excitement and wonderment among 
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all sorts of both scientific and non- 
scientific people.* 

In print and in person, Land 
makes about a dozen serious claims. 
They cannot be dealt with in either 
the direct or the reverse order of 
their “importance,” for lesser ones 
have to be explained before greater 
ones can be appreciated and vice 
versa. They will be taken up here, 
therefore, in more or less an order 
of comprehensibility. First, however, 
there is to be considered the ques- 
tion of novelty of fact. 

Land no longer makes a claim as 
regards the novelty of the rich vis- 
ual results of his simple means. In 
1955-1957 he learned that he was by 
no means the first to find that two 
color-separation transparencies are 
much more than two-thirds as good 
as three, and that when they are 
added, one color filter is much more 
than half as good as two. His latest 
article (Land, 1959c) gives no hint 
that he knew this (perhaps, one of 
the editor’s doings?), but in his first 
paper (Land, 1959a) a footnote ad- 
mitted it. As early as 1914, motion 
picture systems were patented in 
which alternate frames were taken 
through a red filter and no filter or 
through red and green filters, and 
projected through red and green 


*It appeared that I should digest these 
publications and write, 


mentary covering all of th r 
to ask whether fi them. I wrote to Land 


which I should e; 
I was invited to Cambrid 


I might 
n, or dis- 


ommunications,” 
to his laboratory, 


ones or through a red alternating 
with none. 

These systems, and various others 
which used no filters, but two-dye 
films which dodged the great diffi- 
culties of making three-dye-primary 
films, gave excellent results so long 
as the cameraman could dictate the 
colors of the copy. Pure blues had to 
be foregone; but there are so few 
such colors in outdoor nature that 
they were, easily avoided, and the 
faithfulne® of color reproduction 
was satisfactory. If flesh tones, t.e. 
skin colors, were good then every- 
thing else was good enough. It ac- 
tually seemed a waste to use a three- 
color process for a western movie, or 
even for indoor stories where furni- 
ture and costume colors could be se- 
lected to avoid those which would 
come out “wrong.” 

Even Land’s own favorite method 
for still pictures is not new. In a 
memorable Optical Society meeting 
in New York on March 5, 1943, I 
witnessed demonstrations by Ralph 
Evans of Eastman Kodak Company, 
in one of which he projected long and 
short records taken through the same 
filters Land now uses, and with red 
and yellow filters on the two pro- 
Jectors. The hues on the screen, ac- 
cording to notes made at the time, 
included red, orange, yellow, green, 
purple, and magenta. Unwittingly, 
Land has even used subject matter 
similar to one of Evans’s own fav- 
orites: a redheaded girl wearing a 
green sweater. The two techniques 
could hardly be closer, for Land was 
then using red and orange filters; 
and even when he is using no filter 
on one projector and calls the light 
from it white, this is the strongly 
yellowish light coming directly from 
a tungsten filament. 

With long and short records of 
elaborate still-life copy such as as- 


—— 
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semblages of groceries or bric-a- 
brac, projected with red and green 
beams or with red and yellow 
(‘white’) ones, it is clearly not the 
results on the screen which are novel, 
but Land's explanations of them. 
Land's full-color results are only the 
latest of a series of rediscoveries of 
the same phenomena—first seen, in 
two-primary projected photographs, 
at least as early as 1897. 

A claim, as good as any to com- 
mence with, is that it Is just be- 
cause the subject matter is elaborate, 


and rendered by good photography, 
that the many colors appear at all. 
What is on the screen, Land calls a 
total image, a complete image, a 
natural-image situation. The colors 
seen, except for those of the projec- 
tion lights and additive mixtures 
thereof, do not appear when the same 
lights are mixed in a simple isolated 
spot. Classical color-vision theories, 
classical color-mixture laws, do not 
explain or predict the extra colors. 
In any place on the screen, the wave 
lengths and energies just there do 
not determine the color seen there. 
If changing the wave lengths changes 
the colors, it does not do so accord- 
ing to the rules of additive color 
mixture. The rays of different wave 
length are not themselves color- 
making; instead, colors come from 
the interplay of longer and shorter 
wave lengths over the entire scene, 
and it does not much matter what 
those wave lengths are so long as 
some are longer and some are shorter. 
The colors of monochromatic, lights 
and those of colorimeter mixtures 
arë all one big special case, and every- 


7 
one since Newton has been misled by 
how colors arise 1n 


“natural” 
It is easy t 


is Land’s color ar —_ 
special case. If his screen 


through a slender tube, only spots of 
red, pink, and white can be found; - 
all other colors vanish. One can take 
the same tube around the lighted 
room or outdoors, and view any- 
thing and everything, but hardly 
any objects will change color hardly 
at all when seen thus. Land's screen 
pictures are most unnatural images. 

Now it is perfectly true that the 
wave lengths and energies at a screen 
locus do not determine the color 
seen there. But this does not mean 
that the color is independent of the 
light at that locus. It is utterly de- 
pendent upon the light in the ‘‘ob- 
ject” in relation to the light around 
the object. Even in situations much 
simpler physically, it is easy enough 
to see a spot as having any of several 
hues when spot and surround are 
constituted of nonselective materials 
and illuminated by a single mono- 
chromatic light. The relative in- 
tensities created by relative reflect- 
ances are governing here, as shown 
by Helson (1938) and by Judd (1940), 
and expressed in Helson’s principle 
of “color conversion” with respect 
to adaptation levels. In Land’s ma- 
terial, one ratio-and-sum of two 
projection lights represents the ob- 
ject. Other ratios-and-sums sur- 
round it—if they were not “other,” 
there would be no object, for there 
would be no contour. 

In his own kind of quantitative 
analyses, Land has paid much atten- 
tion to the ratio of the intensity of 
each kind of light in each object to 
the maximal intensity of that same 
light (at some place on the screen). 
He thus takes careful note of the 
relative intensity of each light in the 
object, and of the intensities of the 
two object-lights relative to each 
other. He completely neglects the 
other absolute and relative inten- 
sities of the same two lights which are 
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hitting the screen around the ob- 
ject’s contours. There are spot-and- 
surround situations all over Land's 
screen, and the color of any spot is 
at the mercy of a different spectral 
composition in its surround unless 
the luminance of the spot is much 
higher than that of the surround. No 
wonder, then, that the color seen in 
any spot seems to be “independent” 
of the physics of what is in the spot. 
To a colorimetrist, Land’s most 
prized phenomena would constitute 
intolerable artifacts. Colorimetry 
must be done in a dark or at least a 
neutral surround. Ifa sample were 
matched by mixing in a colored sur- 
round of uncontrolled chromaticity, 
the specification found for the sample 
would be quite meaningless. 

The explanation of his extra colors 
which Land obviously must contro- 
vert, before he can expect our at- 
tention to any other explanation, is 
one in terms of simultaneous color 
contrast. The expression I myself 
prefer is “spatial induction (of com- 
plementary hue)” for which I shall 
use induction for short, here. Still a 
third term for the same phenomenon 
is “lateral adaptation” which Evans 
(1943, 1948) points to, along with 
“general” and “local” adaptation, 
as being involved in the viewing of 
any sort of picture in color. For the 
special case where both spot and sur- 
round have the same chromaticity 
but different intensities, Helson’s 


classical “color conversion” is at 
work, 

There are certain facts about in- 
duction which a 


„are not generally 
known, but which are involved in 


Land’s situations (with their action 
concealed there), and which can be 
brought out also in the simplest spot- 
and-surround situations. One usu- 
ally thinks of induction as it is shown 
off in an elementary laboratory 


course, with a square inch of gray 
paper laid upon a large sheet of 
colored paper. Here, the phenome- 
non seems tame indeed. The com- 
plementary coloration induced into 
the gray is not very convincing, al- 
though it becomes more so if a sheet 
of tissue is laid over all. Tongue in 
cheek, one tells students that this 
blurs the contour, and that this fa- 
cilitates induction across it. 

The induction product is far more 

vivid when it is obtained on a screen 
with a broad annulus of colored light 
surrounding a disc of white light 
from a second projector. At an 
optimal intensity of the spot, its 
saturation is maximal. Here again, 
the effect is promoted if the mutual 
contour of surround and spot is de- 
liberately destroyed, by moving 
either slide so that disc and an- 
nulus are a bit out of registry. 
_ If now a flood of dim white light 
is put over the screen with a third 
Projector and gradually increased in 
intensity, one finds that the colored 
annulus is quickly washed out—a 
newcomer entering the room would 
not dream that any colored light 
was there; but, the colored spot is as 
Saturated as ever, Kirschmann’s laws 
to the contrary. Specifically, if the 
annulus is blue the spot is yellow, 
and when the white wash has com- 
pletely desaturated the blue the spot 
still glows like a sun. The durability 
of the induced color has to be seen to 
be believed—the white wash cannot 
wash if out. Here is a color which 
has no visible means of support, and 
is as unexpected as any color in one 
of Land’s Pictures; for, a spectro- 
photometer will show only white 
light in the spot and the eye sees 
only white surrounding it, 

When such colors are induced into 
the very real-looking objects which 
even a black-and-white transparency 


ay ~ 
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puts on a projection screen,’ their 
vividness—their apparent satura- 
tion, if the term is allowable—is tre- 
mendous. 

Whatever hue is induced into a 
spot is inevitably complementary to 
whatever light, in the surround, is 
doing the inducing. This is not to 
say, however, that the surround and 
spot will necessarily be seen in com- 
plementary colors. In Land’s pic- 
tures, this is rarely the case—which 
unquestionably influenced him to 
reject a “contrast” explanation. In 
the first place, complementariness 
here is not in the sense of mixture 


complements—those paired lights 
which, mixed in a particular intensi- 
tive ratio, will comprise a white light 
—but in the sense of a colored light 
and the color of its after-image. The 
mixture complement of a light and 
its induction or after-image comple- 
ment are not just the same in hue; 
there are differences, for known 
reasons (Helson, 1938; Judd, 1940; 
Wilson and Brocklebank, 1955). In 
the second place, a physically chro- 
matic surround may not appear 
colored at all; or the color seen in a 
spot may be “mixed” from the direct 
color of the light actually there and 
another color induced into it. When 
a surround is mixed from two beams 
of light they may both be colored, or, 
e.g., red and tungsten “white,” and 
in either case the mixture will tend 
to be seen as white in consequence of 
general chromatic adaptation. But 
the red-light content acts as if it 
were alone, in causing lateral adapta- 
tion of the retinal area subtending 


3 An almost forgotten word for a slide pro- 
most likely to be used, 


jector is stereopticon ( 
nowadays, by mistake for stereoscope). It de- 


rives from the corporeality which images oe 
when one sees them ona textureless screen an 
is not forced to see them in & plane, as on a 


paper print. 
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the spot. Truly white light in the 
spot would now appear white-minus- 
red, i.e., blue-green. But if the spot 
light is itself a little reddish, some of 
its direct redness can combine with 
the induced greenness and a little 
of the blueness to give a basis for 
neutrality, and the remaining direct 
redness and induced blueness will 
make the spot purple. If on the 
other hand the spot light is yellowish, 
this direct huedness may kill the 
blueness of the hybrid induced hue 
and allow the spot to appear purely 
green. Furthermore, if a spot is sub- 
stantially brighter than its surround 

it can exhibit the hue of either pro- 

jection light or any additive mixture 

of the two, while in a spot much dim- 

mer than its surround, induced black- 

ness can create maroon, navy blue, 

olive, brown, and other “synthetic” 

surface colors. All of these possi- 

bilities give way to others, with any 

manipulation of the ratio and sum of 

the two kinds of light blended in the 

surround. 

Land’s extra colors have a kinship, 
stronger than he admits, with colored 
shadows. Produced on a screen with 
o projectors and a filter or two, 
always come in pairs,* 
which have interesting properties. 
If both projection lights are colored 
but not complementary, then quite 
regardless of what their colors are, 
the hue of neither shadow is ever the 
hue of either light since the surround 


tw 
shadow colors 


4 The most familiar of all colored shadows, 
the blue ones in a snowscape, may appear to 
be exceptional. When objects are illuminated 
both by yellow sunlight and white cloud-light, 
blue is seen wherever an object blocks the 
sunlight from the snow. The unseen (adapted- 
out) yellow surrounding the shadow induces 
complementary blueness into the white light 
which, alone, occupies the shadow. No yellow 
mate to the blue shadow is seen, because the 
huge white cloud source creates no shadows at 


all. 
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of each shadow is not purely either 
light. But, the hues of the shadows 
are invariably complementary to 
each other. This fact has been known 
so long that its first discoverer is lost 
—while its most recent investigator 
(H. Self, a student of Helson's) has 
not yet published his work. If the 
light from one projector is truly 
white and the other light is hued— 
Say, green—we have a special case. 
The hue of one shadow will be that 
same green since the surrounding 
white-plus-green can induce nothing 
that will alter it; the hue of the other 
shadow will be red-purple, induced 
into the white light in the shadow by 
the impure green light outside the 
shadow. Here, side by side, are a 
“purely physical” (green) shadow 
and a “physiological-psychological" 
(magenta) one. It is important to 
note that both shadow colors can be 
highly and equally saturated while 
the rest of the screen is acceptably 
white and necessarily brighter than 
either shadow—again Kirschmann’s 
Laws to the contrary. 

With Land-type long- and short- 
record slides in red and white pro- 
jectors, one has only to displace one 
slide to throw the two records out of 
registry on the screen. One then sees 
that the “objects” in one slide are 
represented by a set of 


putting u 
other (and complemer 
of course the two h 


) Pushing 
ck into registry, 
for a time—that the 
colors are all created 


ing two com- 


full color, this is an exaggeration 
in two respects. The term implies a 
faithfulness of reproduction of the 
copy colors greater than Land does 
get. It further implies that all possi- 
ble hues are there, or obtainable, 
even if some objects have slightly 
wrong hues. The composition on the 
screen does exhibit many, many 
colors (a color being a hue at a given 
saturation and a particular light- 
ness). If the observer penetrates the 
glamor of the multiplicity of satura- 
tions and lightnesses creating “tints” 
and “shades,” and beautiful surface 
colors (present by grace of induced 
blackness), and resolutely hews to the 
hue circle, he finds that there are 
gaps in it. Nothing on the screen is 
really blue without admixture of 
greenness, and nothing is green with- 
out contamination with blueness, 
Only in casual viewing can one think 
that some of the palest blue-greens 
are blue; some of the darkest, green, 
Land had considerable trouble mak- 
ing a lemon look like a lemon with 
red and white Projection—he told 
me that it took three months of trial 
and error with taking filters, ex- 
Posures, and darkroom work, 

What can be e 
ous ratios-an 
low illumina 
of object contours? 


hites being 
gh and the 


(establishing the observer’ 
as his general chroma 
level). Wherever about this same 
ratio occurs but with a lower 
low sum, there will 


s “white” 


Upset the ratio 


fairly high, and 
there can be oranges and yellows. 


Then, there will be blue-greens in- 
duced by a strong red-light content 
of surrounds, even when these are 
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adapted out and appear white, or ex- 
hibit induction colors themselves. 
Some of the blue-greens will be 
greener, passing for green; some 
bluer, passing for blue. There will be 
purples, and blackness-containing 
synthetic surface colors—tans, 
browns, maroons, navy, drab, etc. 
What more could one want? What 
thunderstruck viewer of this marvel 
would think to ask whether anything 
in the copy had been sky-blue? Is 
it just a coincidence that Land's dem- 
onstration pictures have all been 
made indoors, and show objects se 
selected? 

Now for Land to say that none of 
this is described, explained, or pre- 
dicted by “classical theory” is to be 
a bit harsh toward centuries of a 
complex science and a legion of color 
scientists living and dead. As old as 
nch of color science is the 


any bra 
knowledge of induced hues—at low 
saturations, to be sure, prior to 


and projectors. They 


electric sources 2 
e design 


were being exploited in th 
of Gobelin tapestries in Thomas 
Young's lifetime. No one knows 
how long ago mothers began to teach 
their daughters what colors to avoid 
in hats and scarves, 50 as not to affect 
their facial complexions adversely. 
“Accidental colors” were being mon- 
ographed when people were still 
arguing over whether three primaries 
were necessary and sufficient in 
painting and printing. 

It may be that none of Land’s 
technological predecessors, with two- 
beam, one-filter color systems, knew 
just why their systems worked so 
well. It may be that no one before 
Evans and Land had the photo- 
graphic facilities, the filters, and the 
projectors with which to take such 
full advantage of induction to pro- 
duce such preath-takingly beautiful 
pictures with such saturations of the 
unexpected colors. But what needs 
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to be explained is the long list of 
hues that are there and the short 
list of those that are not there, and 
this can be done without making use 
of any part of Land’s explanation. 
One needs only old information and 
old language. In fact several lan- 
guages are available; that of simul- 
taneous contrast, that of spatial in- 
duction, Evans's three kinds of 
chromatic adaptation (1943, 1948), 
Helson’s principle of color conver- 
sion (1938, 1955), and so on. 

Land explicitly denies that any 
sort of chromatic adaptation is re- 
sponsible for his color arrays: “The 
colors .. . appear immediately, and 
do not alter appreciably with time, 
and we do not seek to explain them as 
effects of adaptation. To the best of 
our knowledge the gamut of color is 
much larger than that . . . predicted 
by any theory concerning chromatic 
adaptation” (Land, 1959a). He 
formerly demonstrated the immedi- 
acy of appearance of the colors by 
turning on strong room lights which 
obliterated the screen image, and 
then switching them off to show that 
“no visible time lapse” occurs before 
the full-color effect is seen. A newer 
device, which he demonstrated to 
me, is a double shutter with which 
both beams of his dual projector can 
be flashed to the screen. Even with 
only a 0.01-sec exposure, the colors 
are all there. Land considers that 
since inadequate time is being al- 
lowed for any adaptation process, 
adaptation cannot explain the colors. 

Land’s trouble here is that in any- 
thing he has published, he shows 
awareness of only the long, slow, 10- 
tensitive sorts of adaptation which 
are regulated by the cold-molasses 
kinetics of photopigment-concen ai 

tion change. The general and mene 
chromatic adaptations which underly 
contrast phenomen 


a are inst : 
presentation of the stimuli 
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why the contrast is called simultane- 
ous). They are more akin to the 
alpha adaptation of Schouten and 
Ornstein (1939) than to the photo- 
chemical beta adaptation, and they 
are readily accomplished by the ex- 
ercise of inhibitions in purely chro- 
matic pathways, leaving brightness 
vision unaltered (see Walls, 1955). 
No one to my knowledge has had the 
hardihood to try to measure the 
tiny time it takes for induced colors 
to appear. Since the basis of spatial 
induction is neural and not photo- 
chemical, any time consumed is 
hardly more than is required for a 
neuronic message to traverse a couple 
of synapses. As a matter of fact, 
simple “persistence of vision” could 
allow a full manifestation of induc- 
tion effects even if the presentation 
time of the stimuli were cut below 
the period consumed in the induction 
itself. 

By rejecting adaptation, Land 
brings upon himself just as much 
trouble in explaining whites and 
grays, as hued areas. The less stray 
light there is, escaping from pro- 
jectors etc., the more surely any 
colored light or some ratio of any two 
lights, mixed on the screen, will ap- 
pear white. It is a familiar experi- 
ence that even in the ruby illumina- 
tion of a Photographic darkroom, a 
Piece of white paper is subjectively 
white. In Land’s projected composi- 
tions there need be no broad white 
backgrounds, and there need be no 
asec’ By aa ilüminated border 
two lights he eh ee Whateyer 
observer’s Waite’! git ee ae i 

Ly ‘Waite’ will be some in- 
tensitive ratio of the two, whether 
one light is colorimetrically white, or 
even if both are colored. In any 
case the average ratio Over the 
screen will be what the observer 
adapts to, and he will see white or 


gray in any discrete area that hap- 
pens to contain that ratio. The 
more heterogeneous the copy may 
have been chromatically, the more 
closely the observer's white will ap- 
proach the ratio coming out of the 
projectors if the slides are removed 
from them and the intensity controls 
(already adjusted for best verisimili- 
tude) are left untouched. 

A conviction of Land's which is 
pregnant with several kinds of sig- 
nificance is that the correct hier- 
archical order of the colors in his pic- 
tures, the similarity of the color of 
each object to that of its model in 
the original copy, defies explanation 
under classical theory or in terms of 
any sort of chromatic-adaptation 
phenomena. In effect he says that 
when he began to use his unorthodox 
primaries he was not expecting close 
verisimilitude, and has never str 
to attain it, but it is there anyway. 
It is the fact that he obtains a full 
series of hues “from red to blue” 
when using only red and yellow, or 
yellow and green (etc.) lights, that 
must seem the most magical aspect 
of his results, to those who obtain 
their scientific information from pop- 
ular magazines. This degree of veri- 
similitude secured so cheaply, by 
such simple means, is what piques 
the cupidity of those who now con- 
template lower costs and higher 
profits in color photography, color 
television, and color printing, all as a 
result of Land’s research. And, it 
was this verisimilitude that orig- 
inally enticed Land far out on his 
Private theoretical limb. Seeing no 
basis for it in the stimulus situation, 
Land attributed the fullness of the 
full-color gamut to a hitherto unsus- 
pected property of the eye itself. 
His own further tests of his hypothe- 
Sis that “the eye can build colored 
worlds of its own” made with ap- 


ained 
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paratus other than simple lantern- 
slide projectors, have served only to 
entrench it in his mind. 

_The fact that in Land’s potpourri 
pictures the right colors end up, for 
the most part, in the right objects is 
only made to seem the more remark- 
able when one considers that all the 
hues he ever gets may also be ob- 
tained on a screen when the slides 
projected have stemmed from ma- 
terial that never was colored. One 
may make two montages of pieces 
of neutral gelatine filters cemented to 
glass plates, identical excepting that 
each piece in one montage is replaced 
in the other by a congruent piece of 
a different density. Projected 
through the same filter(s) Land 
might use, these will yield the same 
set of hues on the screen. Just as 
complete ‘full color” may even be 
obtained with photographs, when 
these are of such character that there 
is no reason to expect any object to 
appear in any particular color. This 
is the case with “long” and “short” 
records photographed through a dead 
rat with two different wave length 
bands of X rays (and, of course, no 
color filters), then projected in reg- 
istry through color filters which 
“translate” the radiopacity pattern 
of the subject matter into a color 
pattern easier to study (see Mackay 
& Collins, 1957). 

If one could not even speak of 
“verisimilitude”’ in colored X-ray 
lations of heads and legs or in 
montages, how is 
it that the hues are not random in 
Land's pictures also? „What makes 
them click into place in the correct 
objects, unless Land is i in 
thinking that the eye itself puts 


them there because it is cia ocr 
“natural image”? Sadly for Land s 


o verisi ilitude in his 
hyp theses, the erisimHitų i 

i is j vitable 
pictures 1S jus 


trans 
colorized neutral 


t about as ine 


as it is in Kodachrome. At the same 
time, paradoxically, it is due to a 
coincidence. 

_ When Land assembles some ob- 
jects with hues ranging from red to 
blue and photographs them through 
red and green filters, then projects 
with red and yellowish lights, the 
long record of a red object is of low 
density and lets through intense red 
light, while the short record is dense 
and lets through little of the other 
light. Naturally the red object is 
rendered red. A really green or blue 
object is represented in the long 
record by a dense area and in the 
short record by a pale one. The 
shorter-waved (or the “white”’) light 
predominates in the screen locus of 
the object. This light however must 
of necessity appear blue-green or 
green-blue. This is not at all be- 
cause the original object had such a 
color, but because the stronger red 
illumination around the object-area 
of the screen induces its comple- 
ment, and this happens to be an am- 
biguous hue which satisfies the ob- 
server as a reproduction of either a 
green object or a blue one as the 
case may be. 

It is a beautiful coincidence, at 
work for Land, that the complement 
of his red projection light is a blue- 
green, SO close to blue and so very 
far from red in the spectral series of 
hues. With selected copy the result 
is almost full color, and the hasty 
cannot be blamed for calling it full 
color. Only pure blue lies beyond 
the series one does get, for violet is 
really a purple, the purples are extra- 
spectral anyway, and moreover 
Land’s method does afford some 
purples! The reason why pure blue 
is not obtainable is not that blue 
wave lengths coming from blue copy 

: by the green 
pigments are stopped DY hort 
taking filter used to make the $ 
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record; it is that the long-wave pro- 
jection light employed has a hybrid 
hue, not a good blue, as its comple- 
ment. The projection lights, not the 
copy colors nor the taking filters, 
are the real determiners of the set of 
hues the screen can show. Land, 
however, thinks of them as the least 
important elements in the process, as 
we shall see. 

If anyone should fail to grasp the 
fact that the “nearly full” color of 
Land's pictures and their fair veri- 
similitude (excellent for long-wave 
colors; poorer and poorer—but never 
too bad—for shorter-wave ones) are 
the results of a beautiful and beauti- 
fying coincidence, then I would ask: 
Suppose the physiology of color vi- 
sion were such that the complement 
of red was, say, yellow? How good 
then would be the verisimilitude? 

Land’s idea that hue is, or can be, 
independent of wave length, and his 
idea that the obtaining of rich colora- 
tion from two “primaries” demands a 
completely new explanation, led him 
to conclude that “the eye can build 
colored worlds of its own out of in- 
formative materials that have always 
been supposed to be inherently drab 
and colorless.” When the retina is 
receiving only two kinds of light 
which are not very different from 
each other “the eye” (=the visual 
system) is apparently able to reas- 
sign hue-sensations, which “belong” 
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each arousing immutably a unique 
elementary hue-quality irrespective 
of the wave length(s) of radiation its 
unique photochemical contents are 
absorbing. 

In order to determine how close 
together two projection lights illum- 
inating long and short records could 
be, i.e., how little of the Newtonian 
spectrum they could embrace and 
still give his full color, Land first con- 
templated using interference filters, 
with much narrower pass bands than 
gelatine ones have. Too much in- 
tensity would be sacrificed for spec- 
tral closeness; so, Land had a high- 
intensity dual monochromator de- 
signed and built. With this he could 
transilluminate either record with 
any wave length, seeing the two 
images in registry by means of a 45° 
half-silvered mirror. 

Land now found that with a wide 
variety of pairs of monochromatic 
lights, he could secure full color. It 
has to be remembered that Land 
uses the word “color” as if it meant 
hue. With wave lengths too close 
together the colors were very un- 
Saturated, and when the lights came 
from certain regions of the spectrum 
the gamut of hue itself was much re- 
stricted. Land has claimed to geta 


good result, however, even when the 


two wave lengths were 579 my and 


599 mp. The color illustration pre- 
pared by Scientific American (for 
Land, 


1959c) which purports to 
show what then appears on the mir- 
Tor is very hard to believe, 

One phenomenon appeared with 
the dual monochromator which ap- 
parently still puzzles Land, although 
he has promised a discussion of it. 
This is what he calls “short-wave re- 
versal.” When both of the trans- 
illuminating wave lengths are short 
Ones, the very few hues he then gets 
are in the reverse of a correct order; 
that is, the coolest-colored copy-ob- 
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jects come out in the least cool hues. 
This corrects itself if he interchanges 
the records or the wave lengths so 
that a violet wave length illuminates 
the long record and a blue one the 
short. To my mind the reason for 
this is that violet has redness in it 
and blue does not, so that a violet 
wave length acts like a longer one 
when it is paired with a blue. 

The work with the dual mono- 
chromator appeared to support the 
theory that when only the extremes 
of a short segment of the whole 300- 
mu Newtonian spectrum are avail- 
able to the retina, the eye exhibits 
“amazing versatility” and “can re- 
spond with a full range of sensation” 
(i.e., of hues), by reassigning all the 
hues that belong to a 300-my stretch 
to the wave lengths within a stretch 
of only 20 mp or 30 my. Somewhere 
within such a short range there is a 
point which, Land believes, the eye 
establishes as a “fulcrum.” Wave 
lengths longer than this, no matter 
what their absolute values are, are 
seen with hues which we think of as 
belonging to the longwave end of the 
full spectrum, while wave lengths 
shorter than the balance point are 
responded to as if they were really 
short (blue-green, blue) even if they 
lie in midspectrum. 

Out of this reassi 
arose a corollary hypothesis which 
Land has quite explicitly outlined 
Suppose the pigments of 


gnment theory 


(1959c): 
things around us had very narrow 
reflection-bands of wave lengths, so 


that a number of such pigments could 
be crowded into a short piece of the 
present visible spectrum, say 30 = 
or 40 mp long. Suppose also - 
something should happe? to 

present solar spec 
earth's surface—one 


: : rerials gett 
light-absorbing mate that only that 


P ere— 
oo my “spectrum” was 


available to the retina. The colors of 
the things around us, Land believes, 
would be just the ones we see now. 
The reassignment phenomenon pre- 
pares us to be transported into any 
one of many ‘‘color worlds.” 

The basic idea of a “fulcrum” is 
not wholly wrong, provided one takes 
the position that it really expresses 
Land's own unconscious rediscovery 
of general chromatic adaptation. If 
one projects pictures or shadows 
with yellow and blue-green wave 
lengths and gets red and blue on the 
screen, the red sensation is not being 
“assigned to” the yellow wave length 
but is induced by the blue-green one, 
and the blue is not assigned to the 
blue-green but is induced by the 
yellow. One wave length would in- 
deed appear here to be a fulcrum— 
the dominant wave length of that 
Y-BG mixture which, through gen- 
eral chromatic adaptation, had be- 
come the observer's “white.” But 
the eye does not select this wave 
length out of what is coming into it, 
and use it in a particular way, for it 


need not be present in either stimu- 
lus. 

With the original method using 
slide projectors, Land had found 
that “white” (unfiltered tungsten-) 
light could serve as a “primary” just 
as well as the purest of color-filtered 
lights. The dual monochromator con- 
tained a pair of hinged white sur- 
faces, each of which could be swung 
into position to occlude one of the 
diffraction gratings and reflect the 
undispersed  tungsten-lamp light 
through one slide to the 45° mirror 
and the observer's eye. This ar- 
rangement enabled Land to confirm 
that when shortwave monochromatic 
light came through the short record 
and white light through the long, the 
picture contained reds as though the 
white light were a long wave length. 
When white light was sent through 
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the short record and long-wave mon- 
ochromatic light through the long 
one, the picture exhibited blues as if 
the white light were a short wave 
length. 

This led Land to pose the question, 
“what is the wave length which is 
equivalent to white, the wave length 
than which white is neither longer 
nor shorter?” Such wave length 
would be that which the eye employs 
as “the fulcrum in the ordinary, sun- 
lit world,” with respect to which the 
visual system assigns a redness sen- 
sory quality to the longest other 
wave length available and a blueness 
sensation to the shortest other wave 
length coming into the eye. With 
white light going through one record 
and monochromatic light through 
the other, the observer would find 
the fulcrum wave length as that one 
which made the picture go neutral or 
monotone all over, 

All but two out of Land’s many 
subjects, told to watch ared “object” 
and rotate the wave length control 
until it started to turn green, found 
the “fulcrum” at AS88+2 my, Con- 
sidering that the “white” light here 
Was again yellow tungsten-filament 
radiation, it is immediately evident 


a surround, of 
the same hue, h 


had unusual Jeng 
unusual luminosity 


curves, or some such things. Both 
had normal color vision, for I checked 
them myself with my own Nagel 
anomaloscope, which I had with me 
in Cambridge. 

The dual monochromator work 
made it appear that white light could 
give a sensation of red, blue, or any 
color depending upon what mono- 
chromatic light it was paired with; 
and it seemed to show that almost 
any wave length could arouse a 
sensation of almost any hue, depend- 
ing upon what other wave length, 
far enough away, it was paired with 
(Land, 1959b, 1959c). To demon- 
strate these things convincingly to a 
group, with something other than a 
one-man, eyepiece instrument, Land 
devised another apparatus in which 
he took advantage of the fact that 
one particular wave length of mono- 
chromatic light is easily available at 
very high intensity from an inex- 
pensive source: a standard sodium 
vapor highway lamp emits A589 my. 

In one of his two “sodiu 
(Land, 1959b, 1959¢), a short-record 
transparency with the objects en- 
larged to natural size is transillumi- 
nated by tungsten lamps and viewed 
through a 45° half-silvered mirror, 
and the long record (reversed right 
for left) is _ transilluminated by a 
sodium luminaire and seen in the 
mirror, so that the two color-separa- 
tion pictures are optically coplanar 


ım viewers” 


l . In 
the first viewer, A589 my is longer 
han the dominant wave length of the 
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second viewer, A589 my is shorter 
than the dominant wave length of 
the red long-record filter. 

lhe picture of groceries which 
Land presents with the green filter 
contains some objects partly or 
wholly red, in particular a vivid red 
can of pepper. Land asserts that 
here it is the X589 mp coming from 
the pepper-can area which is arous- 
ing the red sensation. In the other 
viewer, where A589 my is the shorter- 
wave stimulus, some objects appear 
in cool hues, notably a blue-green 
book binding. Here, Land claims 
that \589 my is arousing that sensa- 
tion. So, if one and the same wave 
length can “be” red at one time and 
green or blue at another, the thesis 
that the eye reassigns various hues 
to wave lengths to suit itself is 
abundantly proved. 

The sodium viewer experiment 
proves no such thing. Consider 
what is making the famous pepper- 
can red through the green filter. The 
can is dense in the short record, pale 
in the long, hence is represented by 
bright 589-light and dim green light. 
Since the surround of the can in the 
copy was not red, it is represented by 
relatively intense green light and 
feeble 589-light. The green induces a 
strongly red purple into the can area. 
Here the blueness element of the 
purple encounters the direct yellow- 
ness of the sodium light and they kill 
each other off (after the fashion of 
complementation), leaving the red- 
ness to be seen alone in the can. The 
key to what is really going on is given 
by the individual variation that 
transpired when I myself viewed this 
demonstration in Cambridge. For 
me, the pepper-can was @ slightly 
bluish red, a magenta. I did not de- 
velop enough yellowness to kill all 
the blueness that was there for me. 
Another person present called the 
can purely red. A third and fourth 


called it orangeish-red—for them, 
there was more than enough yellow- 
ness, or else less blueness in the hy- 
brid complement induced by the 
green light. What was varying here, 
of course, was the shape of one's per- 
sonal curve of “the intrinsic satura- 
tion of the spectrum.” 

There are other ways to describe 
the genesis of the red pepper-can; 
but they are all in old languages, 
requiring no supposition that ‘‘the 
eye is revealed to be an instrument of 
unsuspected and awe-inspiring sub- 
tlety.” So far as I am concerned, the 
red of the pepper-can is being seen 
where the \589 my is, but it is the 
adapted-out green in the can’s sur- 
round that is putting the red there. 
The same surround would induce 
redness into any other light substi- 
tuted for the A589 mu, just so its di- 
rect color was low enough in satura- 
tion. In the red-filtered sodium 


viewer, it is just as clear that any 
green-blue or blue-green seen in any 
object is only the expected induction 
product of the more intense red 
light in the surround, which readily 
overwhelms the direct yellowness of 
the \589 mp (yellow being the least 
saturated locus in the spectrum). 
Land, believing strongly as he does 
in his theory that color vision works 
by assignment of hues to available 
wave lengths simply according as they 
are longer or shorter, has been led to a 
philosophy about the evolution of 
human color vision which Bello 
(1959) quotes directly from one of 
Land’s 1957 public lectures: “Would 
it be possible, in a system which had 
only a pair of receptors balancing off 
against each other somehow, for the 
animal to have a sense of a full range 
of hues running from red, orange, 
yellow, green, down into blue? That 
animal could see the leopard in the 
leaves in a way that his competitors 
couldn’t. He would survive the leop- 
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ard... and go on to build the world 
we live in.” Elsewhere (1959c) Land 
has said: 


++. we have not been describing a two-color 
theory of vision.... It is true, however, that 
our experiments deal with two packages of in- 
formation .. . the eye can do almost every- 
thing it needs to do with these two packages 
++» there is not a very big gap in the sensation 
scale to be filled... the visual process will 
remain an amazing one from the evolutionary 
point of view. Why has a system that can 
work so well with two packages of information 
evolved to work better with three? ... we 


feel that the big jump is obviously from one 
to two (p. 99). 


Translated into language which 
most color-vision researchers would 
use, this means that Land views the 
course of evolution of primate color 
vision thus: Originally, a single re- 
ceptor-type afforded only monotone 
or achromatic imagery. Upon the 
differentiation of this into two types 
the system became able to give to 
consciousness the hues from red to 
bluish blue-green, some purples, neu- 
trals, surface colors (containing 
blackness), all by assigning the ex- 
tremes of the hue series to the ex- 
tremes of any available stimulatory 
wave length band. Essentially the 
only hue missing was purest blue. It 
is hard to see why the organism went 
on to add 50% more mechanism (a 
third Teceptor-type and all accessory 

O secure this one more 
nsory gamut; but this 


The catch h 
the first place, 


p ng any 

ueness into any sensation whether 
by induction or otherwise, Man’s 
present three. 


GORDON L. WALLS 


these hue qualities, either singly or 
in blends of two, can get into sensa- 
tions—courtesy of induction—when 
only red and yellow (say) lights are 
impinging upon the retina. But these 
lights, or any two others, or any one 
light, activate all three components 
of the system. In an earlier, two- 
component stage of the system, no 
more hue qualities than red, possi- 
bly (but probably not) yellow, and 
green could have been enjoyed. 
Land (1959c) errs in saying that 
the central idea of the three-com- 
ponent theory of color vision is that 
“the eye responds to three different 
kinds of vibration, and all color 
sensation is the result of stimulating 
the three responses in varying degrees 
of strength.” He attributes this to 
Maxwell and Helmholtz, who in- 
stead believed that each receptor- 
type responds to some extent to every 
wave length in the visible spectrum. 
One has to go back to Thomas Young 
for the short-lived idea that each 


receptor is activated Primarily by 
one bit of th 


r ~ „© SPectrum (one “kind of 
vibration ) and responds at all to 
neighboring frequencies only in 


“forced vibration.” Land seems to 
think that if he uses two Projection 


ther of which 
as any blueness about it, he is 


yet have any 
all, 
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ences perhaps 40 hues.’ Dichromatic 
vision is “a big jump” above mono- 
chromatic, but it does not afford 
40— (pure blue) =39 hues. Even 
though the color-normal sees many 
more colors in a Landian two-pro- 
jector picture than the simple stimuli 
seem to be worth, this would not be 
true for a dichromate. For him, any 
wave length to one side of a neutral 
point can induce only the same one- 
and-only other hue that he gets 
directly from any wave length on 
the other side of the neutral point. 
With long and short records pro- 
jected through red and green filters, 
a normal may see on the screen reds, 
oranges, yellows, chartreuse, greens, 
blue-greens, purples, and neutrals. A 
tritanope standing alongside would 
not see more than one kind of red 
and one kind of green, each at vari- 
ous saturations down to zero (neu- 
tral). 

While most of Land's color re- 
search thus has to be called much 
ado about nothing new, it would be 
a mistake to say that it is all just a 
flash in the panchromatic! Land’s 
“new codrdinate system” and the 
relationships he can extract from it 
are probably salvageable (although 
Judd’s 1940 formulations might be 
as fruitful in the projected-color situ- 
ations, or more so). It may well be 
that in the hands of Land and others 
the codrdinate system will enable a 
quantification and accurate predic- 
tion of induction products which 
never could have been had, or had 
easily, without it. This may be only 
my own wishful thinking, stimulated 


5 This has to be a guess. It is commonly 
taught that there are about 120 discriminable 
hues in an equal-brightness spectrum and 20- 
30 extraspectral purples. There may be 120 
values of AX, but each is a complex of hue and 
saturation. The curve of “Ad vs à” has never 
been determined in an equal-saturation Spec; 
trum. It is not a “hue discrimination curve, 


but a wave length discrimination curve. 


by a considerable sympathy for 
Land. I myself am not qualified to 
quantify anything more complex 
than my bank balance. 

Land's coérdinates are Cartesian, 
but nothing else about them is con- 
ventional. The ordinate axis is a 
logarithmic scale of ‘‘percentage 
available long-wavelength stimulus” 
and the abscissa axis bears a log 
scale of percentage available short 
wave length stimulus. With long and 
short records in projectors or in the 
dual monochromator, and any two 
projection lights, any point on the 
screen finds a place also as a point 
on a graph with these coérdinates. 


When the beams have been regu- 

lated for best color array, the short- 

record illuminant is cut off and the 

maximal intensity of the long record 

image is found and measured, with 

this luminance then taken as 100%. 

The relative intensity at any other 

locus gives an ordinate value for 

that locus. Abscissa values are found 

in the isolated short-record image, as 

percentages of the highest intensity 

anywhere in the image. One point on 

the graph may, of course, represent 
any number of image points where 
the same absolute luminances of the 
two kinds of light happen to occur. 
These various physically identical 
image loci should not exhibit the 
same color unless they all have identi- 
cal surrounds; but Land appears to 
claim that they do, since on a sample 
graph, given in all three of his papers, 
he labels single points with single 
color names. 

Each point on the graph repre- 
sents a relative intensity relative to 
another relative intensity, and is 
supposed to describe and predict a 
unique color on the screen. This has 
helped to make it appear to Land 
that “the eye turns brightness ratios 
into colors” (Bello, 1959), although, 
as pointed out above, he considers 
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only ratios in spots and neglects 
ratios in surrounds and ratios of spot 
ratios to surround ratios. Land 
emphasizes that his codrdinates are 
dimensionless, scaled in naked num- 
bers. Actually, since a new graph 
has to be made for every pair of 
color-separation positives and for 
every pair of lights used for ‘every 
Picture, the coérdinates could be 
calibrated directly in log luminance. 
It is hard for me to see why they are 
not, for the percentage scales do not 
generalize or universalize the graph. 
The predictive values of any one 
graph are confined within itself, for 
that graph can tell nothing about 
_ what colors will be seen, or where, in 
another picture with other lights or 
even with the same lights. 

Land soon noticed that all of the 
whites, grays, and blacks on the 
screen plotted along a 45° line in his 
coérdinate system, This is logical, 
for on the 45° line the arithmetic 
ratio of the intensity of one light to 
the intensity of the other is a con- 
stant. If a screen point with this 
ratio and some sum appears neutral, 
any other point with the same ratio 
and another sum should appear 
neutral; and this ratio is the one 


which ordinarily would exist all over 
the screen if the sli 
moved f 


sarily plot above the 45° fi 
cool colors below į 
whence came 
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Land's description of the Properties 
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of straight lines other than the 45° 
one, on a graph in his coédrdinates. 
He calls them “achromatic tracks" 
and includes among these the or- 
dinate and abscissa axes themselves. 
Just as one has to know that when 
Land says color he means hue, so 
also one has to know that when he 
says achromatic he means achro- 
matic-or-monotone—he makes no 
distinction between a set of screen 
points which all have the same hue, 
and a set which all have no hue, Any- 
thing short of full color, he is likely 
to dismiss as no color. 
Among these“‘achromatic” straight- 
line loci are the rotated or dis- 
placed (but still straight) gray line 
which results from doubling the con- 
trast in one image (by using two 
layers of short-record transparency); 
and a line at right angles to the nor- 
mal gray line. Such a line is the 
locus of all that one can get on the 
screen by projecting any negative 
in registry with a Positive made 
from it. This is not necessarily an 
achromatic (monotone) situation, 
for the line may cross the gray line 


and continue into the “cool” area: 
although one 


Positive is red- 


tensity of 
Would be 
short-reco 


negative is always so dense that all 
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surrounds contain essentially only 
the positive projector's kind of light 
—Land convinced me of that by pro- 
jecting a negative and positive of his 
“standard objects slide” (illustrated 
in Land, 1959a) where, since the 
background in the copy was à mid 
gray, it is about equally translucent 
in both positive and negative. 
Another mysteriously “achro- 
matic” situation is obtained by pro- 
jecting three identical positives made 
from any negative (color separation, 
or not) with two superimposed in 
one projector, the third alone in the 
other projector, and a color filter 
over each lens. The screen loci again 
receive an enormous number of ratios 
of the two kinds of projection light. 
Perhaps Land’s “doubled gamma” 
transparencies have always been too 
dense over most of their area, for I 
have learned of two other people, 
who have work in progress on this 
matter, who do get the various hues 
in the “three identical images” situa- 
tion. 
Thus, there are situations in which 
brightness ratios as such do not get 
“turned into colors” by our clever 
eyes. I don’t know why one doesn’t 
always get new colors when one mix- 
ture of two colored lights is sur- 
rounded by another mixture of the 


same lights. In all of the situations 
Land finds to be “achromat 


ic” there 
is a common feature, in that there 
are regular progressions of ratios. 
This in itself is enough to lead him 
to expect no gaudy results. He justi- 
fies his use of elaborately-composed 
photographs to begin with (instead 
of “scientific” spots-and-surrounds 
and montages of “scientific” Mun- 
sell papers) by insisting that ran- 
domness of the projection light in- 
tensity ratios and of their topo- 
graphic distribution ovet the screen 
is essential for the varicolored effect. 
Such randomize 


d configurations are, , 


he says, natural; and it is in natural 
images that he has found the truth 
about color vision. 

One of the tests with which he 
“proves” the necessity of random- 
ness, and the color-killing effect of 
regularizing in any way the intensity- 
ratio array on the screen, is the ex- 
periment with neutral step wedges. 
With two such wedges in place of 
slides, one oriented horizontally in 
one projector and one vertically in 
the other, and with a red filter over 
one projector, one gets on the screen 
only a huge square which is red in 
one corner, white in the opposite 
corner, and pink in between. Yet, a 
great number of intensity-ratios of 
the lights are there (256, to be exact), 
just as in one of Land's beautiful 
long-and-short-record pictures. 

When I asked for black grids to be 
put on the wedges, which would 
themselves be in registry on the 
screen, and this was done, the screen 
was no longer monotone but ex- 
hibited several hues. Any illumi- 
nated square now had, nearest to it, 
other squares in which the ratios 
were different enough to be capable 
of inducing something into the first 
square. To my own satisfaction this 
showed that it is not randomness per 
se but discreteness that is essential, 
for the ratios in the squares were still 
in a regular topographic array. The 
virtue of Land's photographs of 
hodgepodges is that the easy way to 
get discreteness—local changes of 
ratio large enough to create contours 
—is by permitting randomness, whic 
is rife in just such photographs. 


And with this, I can strike my 
which is that I am 


h being able to ex- 
gets what he does 
old knowledge and 
d making 1° use 
whatever of Land’s fantastic ne 
potheses and n 


48 GORDON L. WALLS 


it for others to explain why he 
doesn’t get what he doesn’t get, i.e., 
why the “unexpected” colors are 
lacking where he has come to expect 
them, as in his various “achromatic” 


situations. In each of these, I am 
sure, some simple principle—which 
just happens to escape nonmathe- 
matical me—will explain the non- 
appearance of “full color.” 


REFERENCES 


Anonymous. New light on the eye. Archit, 
Forum, 1958, 109(6), 124-128. 

BELLO, F. An astonishing new theory of color. 
Fortune, 1959, 59(5), 144-206. 

Cros, M. A bombshell in color theory. 
Graphic Arts Mon., 1959, 31(6), 26-30; 204. 

Evans, R. M. Visual processes and color 
photography. J, Opt. Soc. Amer., 1943, 33, 
579-614, 


Evans, R. M. An introduction to color, New 
York: Wiley, 1948. 

HEtson, H. Fundamental problems in color 
vision: I. The principle governing changes 
in hue, saturation, and lightness of non- 
selective samples in chromatic illumina- 
tion. J. exp. Psychol., 1938, 23, 439-476. 


Hetson, H. Color and seeing. Illum. Engin., 
1955, 50, 271-278. 


Jupp, D. B. Hue saturation and lightness of 
surface colors with chromatic illumination. 
J. Opt. Soc. Amer., 1940, 30, 2-32. 


Lanp, E. H. Color vision and the natural 


image. Part I. Proc. Nat. Acad. Sci., 1959, 
45, 115-129. (a) 

Lanp, E. H. Color vision and the natural 
image. Part II. Proc. Nat. Acad. Sci., 1959, 
45, 636-644. (b) 

Lanp, E. H. Experiments in color vision. 
Sci. Amer., 1959, 200(S), 84-99. (c) 

Mackay, R. S., & Cotuins, C. C. Color X- 
ray images and enhanced contrast. J. Biol. 
Photog. Ass., 1957, 25, 114-118. 

Scuouten, J. F., & ORNSTEIN, L. S. Meas- 
urements on direct and indirect adaptation 
by means of a binocular method. J. Opt. 
Soc. Amer., 1939, 29, 168-182. 

Watts, G. L. A branched-pathway schema 
for the color-vision system and some of the 
evidence for it. Amer. J. Ophth., 1955, 
39(2, Pt. II), 8-23, 

WEavER, W. Dither, Science, 1959, 130, 301, 

Witson, M. H., & Brocklebank, R., W, Com- 
plementary hues of after-images, J. Opt. 
Soc. Amer., 1955, 45, 293-299. 


(Received July 17, 1959) 


PsycnoocicaL Bu 
2 LLETIN 
Vol. 57, No. 1, 1960 i 


THE EFFECTS OF VIOLATIONS OF ASSUMPTIONS 
UNDERLYING THE ż TEST! 


C. ALAN BONEAU 
Duke University 


As psychologists who perform in 
a research capacity are well aware, 
psychological data too frequently 
have an exasperating tendency to 
manifest themselves in a form which 
violates one or more of the assump- 
tions underlying the usual statistical 
tests of significance. Faced with the 
problem of analyzing such data, the 
researcher usually attempts to trans- 
form them in such a way that the 
assumptions are tenable, or he may 
look elsewhere for a statistical test. 
The latter alternative has become 
popular because of the proliferation 
of the so-called nonparametric Or 
distribution-free methods. These 
techniques quite generally, however, 
couple their freedom from restricting 
assumptions with a disdain for much 
of the information contained within 
the data. For example, by classifying 
scores into groups above and below 
the median one ignores the fact that 
there are intracategory differences 
between the individual scores. Asa 
result, tests which make no assump- 
tions about the distribution from 
which one is sampling will tend not 
to reject the null hypothesis when it 
is actually false as often as will those 
tests which do make assumptions. 
This lack of power of the nonpara- 


1 This project was undertaken while the 
author was a Public Health Service Research 
Fellow of the National Institute of Mental 
Health at Duke University. The computa- 
tions involved in this study were performed in 
the Duke University Digital Computing Lab- 
oratory which is supported in part by Na- 
tional Science Foundat! 3 -665 
The author wishes to express his appreciation 
to Thomas M. Gallie, Director of the Labora- 
tory, for his cooperation and assistance. 


metric tests is a decided handicap 
when, as is frequently the case in 
psychological research, a modicum 
of reinforcement in the form of an 
occasional significant result is re- 
quired to maintain the research re- 
sponse. 

Confronted with this discouraging 
prospect and a perhaps equally dis- 
couraging one of laboriously trans- 
forming data, performing related 
tests, and. then perhaps having diffi- 
culty in interpreting results, the re- 
ften tempted simply to 
ignore such considerations and go 
ahead and run a ł test or analysis of 
variance. In most cases, he is de- 
terred by the feeling that such a 
procedure will not solve the problem. 
If a significant result is forthcoming, 
is it due to differences between means, 
or is it due to the violation of assump- 
tions? The latter possibility is usually 
sufficient to preclude the use of the 
tor F test. 

It might be suspected that one 
could finesse the whole problem of 
untenable assumptions by better 
planning of the experiment or by a 
more judicious choice of variables, 
but this may not always be the case. 
Let us examine the assumptions more 
closely. It will be recalled that both 
the ¢ test and the closely related F 
test of analysis of variance are pre- 
dicated on sampling from a norma 
distribution. A second assumption 
required by the derivations is t at 
the variances of the distributions 
from which the samples h bee 

taken is the same (assumption ° 


homogeneity of variance). j 
it is necessary 
49 


searcher is 0 
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the test exhibit independent errors. 
The third assumption is usually not 
restrictive since the researcher can 
readily conduct most psychological 
research so that this requirement is 
satisfied. The first two assumptions 
depend for their reasonableness in 
part upon the vagaries inherent in 
empirical data and the chance shape 
of the sampling distribution. Certain 
situations also arise frequently which 
tend to produce results having in- 
trinsic non-normality or heterogene- 
ity of variance. For example, early 
in a paired-associate learning task, 
before much learning has taken place, 
the modal number of responses for a 
group will be close to zero and any 
deviations will be in an upward direc- 
tion. The distribution of responses 
will be skewed and will have a small 
variance. With a medium number 
of trials, scores will tend to be spread 
over the whole possible range with a 
mode at the center, a more nearly 
normal distribution than before, but 
with greater variance. When the 
task has been learned by most of the 
group, the distribution will be skewed 
downward and with smaller variance. 
In this particular case, one would 
probably more closely approximate 
normality and homogeneity in the 
data by using some other measure, 
perhaps number of trials for mastery, 
In many situations this option may 
not be present, 

here is, howe 
the ordinary ¢ an 


apprised of 
quist (1953) 
results of a 


study by Norton (1951). Norton’s 
technique was to obtain samples of 
Fs by means of a random sampling 
procedure from distributions having 
the same mean but which violated 
the assumptions of normality and 
homogeneity of variance in prede- 
termined fashions. As a measure of 
the effect of the violations, Norton 
determined the obtained percentage 
of sample Fs which exceeded the the- 
oretical 5% and 1% values from the 
F tables for various conditions. If 
the null hypothesis is true, and if the 
assumptions are met, the theoretical 
values are F values which would be 
exceeded by chance exactly 5% or 
1% of the time. The discrepancy be- 
tween these expected percen tages and 
the obtained percentages is one use- 
ful measure of the effects of the viola- 
tions. 

Norton’s results may be summa- 
rized briefly as follows: (a) When the 
samples all came from the same 
population, the shape of the distribu- 
tion had very little effect on the per- 
centage of F ratios exceeding the 
theoretical limits. For example, for 
the 5% level, the percentages ex- 
ceeding the theoretical limits were 
7.83% for a leptokurtic population 
as one extreme discrepancy and 4.76% 
for an extremely skewed distribution 
as another. (b) For sampling from 
populations having the same shape 
but different variances, or having 
different shapes but the same vari- 
ance, there was little effect on the 
empirical percentage exceeding the- 
oretical limits, the average being be- 
tween 6.5% and 7.0%. (c) For sam- 
pling from populations with different 
shapes and heterogeneous variances, 
a serious discrepancy between theo- 
retical and obtained Percentages oc- 
curred in some instances, On the 
basis of these results, Lindquist (1953 
P. 86) concluded that “unless the 


heterogeneity of either form or vari- 


ST a 
a 


VIOLATIO 


ance is so extreme as to be readily 
apparent upon inspection of the data, 
the effect upon the F distribution will 
probably be negligible.” 

This conclusion has apparently had 
surprisingly little effect upon the 
statistical habits of research workers 
(or perhaps editors) as is evident 
from the increasing reliance upon the 
less powerful nonparametric tech- 
niques in published reports. The 
purpose of this paper is to expound 
further the invulnerability of the £ 
test and its next of kin the F test to 
ordinary onslaughts stemming from 
violation of the ‘assumptions of nor- 
mality and homogeneity. In part, 
this will be done by reporting results 
of a study conducted by the author 
dealing with the effect on the ¢ test 
of violation of assumptions. In addi- 
tion, supporting evidence from a 
mathematical framework will be used 
to bolster the argument. D 

To temper any imputed dogmatism 
in the foregoing, it should be empha- 
sized that there are certain restric- 
tions which preclude an automatic 
utilization of the t and F tests with- 
out regard for assumptions even when 
these tests are otherwise applicable. 
It is apparent, for example, that the 
violation of the homogeneity of vari- 
ance assumption is drastically dis- 
turbing to the distribution of t's and 
Fs if the sample sizes are not the same 
for all groups, a possibility which was 
not considered in the Norton study. 
It also seems clear that in cases of 
extreme violations, one must have a 
sample size large enough to allow 
the statistical effects of averaging 
to come into play. The need for such 


considerations will be made apparent 
in the ensuing discussion. There is 
that 


abundant evidence, however, 
both the ż and the F tests are much 
less affected by extreme violations 
of the assumptions than has been 


generally realized. 
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A SAMPLING EXPERIMENT 
At this point we will concern our- 
selves with the statement of the 
results of a random sampling study. 
The procedure is one of computing a 
large number of £ values, each based 
upon samples drawn at random from 
distributions having specified char- 
acteristics, and constructing a fre- 
quency distribution of the obtained 
t's. The present study was performed 
on the IBM 650 Electronic Com- 
puter programmed to perform the 
necessary operations which can be 
summarized as follows: (a) the gener- 
ation of a random number, (b) the 
transformation of the random num- 
ber into a random deviate from the 
appropriate distribution, (¢) the suc- 
cessive accumulation of the sums and 
f the random devi- 
opriate sample size 


is reached, (d) the computation of a 
nd sums of 


and sign and the construction of a 
frequency distribution based upon 
the sorting operation. The complete 
sequence of operations was performed 
internally, the end result, the fre- 
quency distribution of 1000 t's, being 
punched out on IBM cards. 
Comments on many of the above 
operations are relevant and will be 
made according to their order above. 
(a) The random numbers con- 
sisted of 10 digits, the middle 10 
digits of the product of the previously 
generated random number and of one 
of a sequence of 10 permutations of 
the 10 digits (0, 1, 2, = “#9 placed 
as multipliers in the machine. T° 


start the process it was necessary tO 


place in the machine a 10 digit a 
dom number selected from 
random numbers. 


of them 
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basis of the first 2 digits. A x? was 
computed to determine the fit of the 
obtained distribution to a theoretical 
one consisting of 100 scores in each 
of the 50 categories. The obtained x? 
of 47.83 is extremely close to the 
49.332 value, which is the theoretical 
median of the x? distribution with 
50 degrees of freedom. 

(b) In order to obtain the random 
deviates (the individual random 
scores from the appropriate popula- 
tion), the random numbers obtained 
in the above fashion were considered 
to be numbers between 0 and 1 and 
interpreted as the cumulative prob- 
ability for a particular score from the 
Prescribed population. From a table 
entered in the machine, a random 
deviate having that probability was 
selected. This is identical with the 
procedure one uses in entering the 
ordinary z table to determine the 
score below which, say, 97.5% of 
the scores in the distribution lie. The 
obtained value, 1.96, is the deviate 
corresponding to that cumulative 
percentage. The distribution of such 


in the computer and were so arranged 
that the mean of each distribution 
was 0 and the variance 1. To verify 
these values, population means and 
variances based on samples of 5000 
deviates from each of the three popu- 
lations were estimated by the usual 
formulas. The results were for the 
normal distribution a sample mean 
of .0024 and a variance of 1.0118, 
for the exponential a mean of .0128 
and a variance of 1.0475, and for the 
rectangular a mean of —.0115 and a 
variance of .9812. All of these results 
could quite easily have arisen from 
random sampling from distributions 
having the assumed characteristics. 
To change the size of the variance of 
the population, all deviates were 
multiplied when necessary by a con- 
stant, in this case, the number 2. The 
resulting distribution has a mean of 
0 and variance of 4. The only vari- 
ances used in this study were { and 4, 

(c) The sample sizes selected were 
5 and 15, 

(d) The formula used for the com- 
putation of £ was the following: 


eee 1 1 
Nı+N:—2 ( ji ) 


deviates from 
obtained by u 
sample of 
distributed. 


right) having a density 
y=e~*, and the rect 
form distribution, 

tions represent extrer 
and flatness to compare with the 
normal. The tables of deviates cor- 
responding to each of the selected dis- 
tributions were contained internally 


function of 
angular or uni- 
These distribu- 
mes of skewness 


Nı Na 


where M, and M: 
the first and secon 
and N, are the 


are the means of 
d samples and N, 


; a respective sample 
sizes. This expression, or an equiva- 


lent statement of it, is found in any 
statistics book and is undoubtedly 
employed in a preponderance of the 
research in which a ¢ test involving 
nonrelated means is used. As pointed 
out in most statistic texts, this test 
1S not appropriate when variances 
are different. Tests are available 
which are more or less legitimate un- 
der these conditions, but a certain 
amount of approximation is involved 
in them. It was felt, however, that 
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the ordinary t test might under some 
conditions be as good an approxima- 
tion as the more complex forms oft 
tests and that a verification of this 
notion was desirable. In addition, 
the above formula makes use of a 
pooled estimate of variance for the 
error term and in this respect is simi- 
lar to the F test of analysis of vari- 
Because of this fact, certain 


ance. 
lized from the ¢ 


results can be genera 
to the F test. 


To summarize, random samples 


were drawn from populations which 
were either normal, rectangular, Or 
exponential with means equal to 0 
and variances of 1 or 4. For several 
combinations of forms and variances, 
t tests of the significance of the differ- 
ence between sample means were 
computed using combinations of the 
sample sizes 5 and 15. For each 0 
these combinations, frequency dis- 
tributions of sample t's were obtained 
on the IBM 650 Electronic Com- 


puter. 

RESULTS 

of the sampling study 
d in part as 2 series 
ributions in the form 
the obtained distri- 
tribution of ts for a particular condi- 
dition. Upon these have been super- 


imposed the theoretical t distribution 
the appropriate degrees of 


freedom. This furnishes a rapid com- 
parison of the extent to which the em- 
pirical distribution conforms to the 


theoretical. 
First we shall consider those combi- 


nations possible when both of the 
s are from no 


The results 
will be presente 
of frequency dist 
of bar graphs of 


may vary. 
the results of sampling fr 


normal distributions, but n 
ples are from the same type of dis- 


tribution. Finally we deal with the 
results of sampling from two differ- 
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ent kinds of populations, for example 
one sample from the normal distri- 
bution, and another from the ex- 
ponential. 

Potentially, a very large number of 
such combinations are possible. Limi- 
tations of the time available on the 
computer necessitated a paring down 
to a reasonable number. Although 
the computer is relatively fast when 
optimally programmed, it neverthe- 
less required almost an hour, on the 
average, to complete a frequency 
distribution of 1000 ?’s. The combi- 
nations presented here are those 
which seemed most important at the 
time the study was made. 

Asa measure of the effect of viola- 
tion of assumptions, the percentage of 
obtained t's which exceed the theoret- 
ical values delineating the middle 
95% of thet distribution is used. For 
8, 18, and 28 df which arise in the 
present study, the corresponding 
values are respectively +2.262, 
+2.101, and +2.048. If the as- 
sumptions are met, and if the null 
hypothesis of equality of means is 
true, 5% of the obtained t's should 
fall outside these limits. The differ- 
ence between this nominal value and 
the actual value obtained by sam- 
pling should bea useful measure of the 
degree to which violation of as- 
sumptions changes the distribution 
of t scores. There is, of course, a ran- 
dom quality to the obtained percent- 
age of t's falling outside the theoretical 
limits. Hence, the obtained value 
should be looked upon as an approxi- 
mation to the true value whic 
should lie nearby. 

In the figures an 
various combination 
variance, and sample size 


represented symbolically in 
EO, 1)5-N0, 4)15- 


d in the text, the 


s of population, 
will be 


lowing form: a 
Here the letters £, 4 & Rre ae 
the population from whic the sa 

f exponentia® 


ple was drawn, 


FREQUENCY. 


VALUE OF t 


Fic. 1. Empirical distribution of t's from 
N(O, 1)5-N(0, 1)5 and theoretical distribu- 
tion with 8 df. 


N for normal, and R for rectangular. 
The first number in the parenthesis is 
the mean of the population distribu- 
tion, in all cases zero, while the 
second number is the variance. The 
number following the Parenthesis is 
the sample size for that particular 
sample. In the example above, the 
first sample is of Size 5 from an ex- 
ponential distribution having a vari- 
ance of 1. The second sample is from 
a normal distribution with variance 
of 4 and the sample size is 15, 


Sampling from Normal Distributions 


In order to justify the random 
sampling approach utilized in this 
study, and partly to confirm the 
faith placed in the tabled values of 
the mathematical statisticians, the 
initial comparisons are between the 


theoretical distributions and the ob- 


tained distributi 
inviolate. Fj 
the empirical 
both samples are taken 
normal distribution wi 


and unit varlance—designated NO 
’ 


g samples are of 
while both are 15 in Fig. 2, 
S, one for 8 df, 


F 1 Tepresent quite well 
the obtained distributions. 


nates approximately two units from 
the mean of the theoretical dist 


tibu- 
tions mark off the Tespective 5% lim- 
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its for rejecting the null hypothesis. 
In Fig. 1,2 5.3% of the obtained t's 
fall outside these bounds, while in 
Fig. 2 only 4.0% of the sample t's 
are in excess. Since in both cases 
the expected value is exactly 5%, we 
must attribute the discrepancy to 
random sampling fluctuations. The 
size of these discrepancies should be 
useful measures in evaluating the 
discrepancies which will be encoun- 
tered under other conditions of sam- 
pling. For examples of 2000 ?’s a dis- 
crepancy as large as 1% from the 
nominal 5% value evidently occurs 
frequently, and for this reason should 
not be considered as evidence to re- 
ject the theoretical distribution as an 
approximation to the empirical one. 

As an initial departure from the 
simplest cases just presented, Fig. 3 
compares theoretical and empirical 
distributions when samples are taken 
from the same N(0, 1) population, 
but the first sample size is 5, the sec- 
ond is 15—that is, V(0, 1)5-N(0, 1)15. 
While this in no sense is a violation 
of the assumptions of the ¢ test, it is 
interesting to note that again sam- 
pling fluctuations have Produced an 
empirical distribution with 4.0% of 


the ’s falling outside the nominal 5% 
limits. 


? The numbers in the tails of some of the 
figures Teport the number of obtained ?’s fall- 
Ing outside the boundaries, 


FREQUENCY 


VALUE OF t 


Fic, 2, Empirical distribution of t's from 


NOO, 1)15-N(O, 1)15 and theoretic] Gee ne 
tion with 28 df. oretical distribu 


ai 
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ieee violation of the assumption of 
nogeneity of variance has effects as 
depicted in Fig. 4. Here the ob- 
tained distribution is based upon 
two samples of Size 5, one from 
N(0, 1) and the other from N(0, 4). 
The fit is again seen to be close be- 
tween theoretical and empirical dis- 
tributions, and 6.4% of the obtained 
ts exceed the theoretical 5% limits. 
By increasing the sample size to 15; 
a distribution results (not shown here) 
for which only 4.9% of the ¢’s fall 
outside the nominal limits. It would 
seem that increasing the sample size 
produces a distribution which con- 
forms rather closely to the £ distri- 
bution. As will be seen later, this isa 
quite general result based upon 
mathematical considerations, the im- 
plications of which are important to 
the argument. For the moment it is 
evident that differences in variance 
at least in the ratio of 1 to 4 do not 
seriously affect the accuracy of prob- 
ability statements made on the basis 
of the ¢ test. 

This last conclusion is true only so 
long as the size of both samples is the 
same. If the variances are different, 
with the present set of conditions 
there are two combinations of vari- 
ance and sample size possible. In 
one case the first sample may be of 
Size 5 and drawn from the popula- 
tion with the smaller variance, while 


FREQUENCY 


i vaLue OF t 

ution of t's from 
N(0, 1)5-NO, 1)15 and theoretical distribu- 
tion with 18 df. 


Fic. 3. Empirical distrib 
( 
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“AWE OF t 

_Fic. 4. Empirical distribution of t's from 
N(O, 1)5-N(O, 4)5 and theoretical distribu- 
tion with 8 df. 


the second sample of Size 15 is 
drawn from the population having 
the larger variance—V(0, 1)5-V(0, 
4)15. In the second case the small 
sample size is coupled with the larger 
variance, the larger sample size with 
the smaller variance—N (0, 4)5-N(0, 
1)15. The respective results of such 
sampling are presented in Fig. 5 and 
6. The empirical distributions are 
clearly not approximated by the £ 
distribution. For the distribution 
of Fig. 5, only 1% of the obtained 
t's exceed the nominal 5% values, 
while in Fig. 6, 16% of the t's fall 
outside those limits. 

There are good mathematical rea- 
sons why & difference in sample size 
should produce such decided dis- 
crepancies when the variances are un- 
equal. Recall that =(X—M)2/(N— 1) 
is an estimate of the variance of the 


FREQUENCY 


VALUE OF t 


| distribution of ts from 


Fic. 5. Empirica L s fr 
theoretical distribu- 


NO, 1)5-NO, 4)15 and 
tion with 18 df. 


FREQUENCY 


VALUE OF t 


Fic. 6, Empirical distribution of t's from 
(N(O, 4)5-N(O, 1)15 and theoretical distribu- 
tion with 18 df. 


population from which the sample is 
drawn. Hence, 2(X — M)? will in the 
long run be equal to (V—1)o?. The 
formula used in this study for com- 
puting ¢ makes use of this fact and, 
in addition, under the assumption 
that the variances of the populations 
from which the two samples are 
drawn are equal, pools the sum of the 
Squared deviations from the respec- 
tive sample means to get a better 
estimate, That is 2(Xi—M,)? 
+2(X:— M)? is an estimate of 
(Mi= 1)o?+(N:—1)o:?. If T? =¢,2 
=o" (homogeneity of variance), then 
the sums estimate (N: +N2—2)o2, 
Hence, 


2(Xi~Mi)?+2(X,— M,)? 
Nı+N:—2 


is an estimate of o?, 
estimating Procedure i 


ance used. For ey 

NO, DSN, a5 xample, the case 

E and M+N,—2 = 18" 
values, Formula 1 has 

value of (4-1) -+(14-4)] 

Using the appropriate 


other Situation, Ni 
4)5-N(0, 1)15, the N 


1 is [(4-4) +(14-1)1/18 = 1.67. 
means that on the 


an expected 
/18 = 3.33. 


This 
average, the de- 
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nominator for the ¢ test will be larger 
for the first case than for the second. 
If the sample differences between 
means were of the same magnitude 
for the two cases, obviously more 
“significant” t's would emerge when 
the denominator is smaller. It so 
happens that when this latter condi- 
tion exists, the variance of the numer- 
ator also tends to be greater than 
in the other condition, a fact which 
accentuates the differences between 
the two empirical distributions. 

Welch (1937) has shown mathe- 
matically that in the case of sample 
sizes of 5 and 15, a state which pre- 
vails here, the percentage of ts ex- 
ceeding the nominal 5% value varies 
as a function of the ratio of the two 
population variances and can be as 
low as 0% and as high as 31.3%, 
If N\=N, there is never much bias, 
except perhaps in the case in which 
the sample sizes are both 2, For 
Nı=N:=10, the expected value of 
the percentage of t's exceeding the 
nominal 5% limits varies between 5% 
and 6.5% regardless of the difference 
between the variances. For larger 
sample sizes, the discrepancy tends 
to be even less, 

Since the Pooling procedure for esti- 
mating the population variance is 
used in ordinary analysis of variance 
techniques, it would seem that the 
combination of un 


ments, That is, a combination of 
large Variance and large sample size 
should tend to make the F test more 
conservative than the nominal value 
would lead one to expect, and, as 
with the ¢ test, small variance and 
large sample size should produce a 

igher percentage of “significant” 
Fs than expected. These conclusions 
are based upon a very simple exten- 
sion to more than two samples of the 
explanation for the behavior of the 


VIOLATIO. 


t test probabilities with unequal sam- 

ple sizes. 
i A more sophisticated mathematical 
andling of the problem by Box 
(1954a) reaches much the same con- 
clusions for the simple-randomized 
analysis of variance. Ina table in his 
article are given exact E mathe- 
matically determined) probabilities 
of exceeding the 5% point when vari- 
In this case, 


ances are unequal. 
sampling is assumed to be from nor- 


mal distributions. If the sample 
sizes are the same, the probability 
given for equal sample sizes range 
from 5.55% to 7.42%, for several 
combinations of variances, and num- 
bers of samples. If, when variances 
are different, the samples are of differ- 
ent sizes, large discrepancies from 
the nominal values result. Combining 
large sample and large variance less- 
ens the probability of obtaining a 
“significant” result to much less than 
5%, just as we have seen for the £ 
test. Ina subsequent article, Box 
(1954b) presents some results from 
two-way analysis of variance. Since 
these designs generally have equal 
cell frequencies the resu 


Its are not 
too far from expected. His figures 
f the 5% value 
expected 
It wou 
pirically an 
can be demonstrated only a 
effect on the validity of probability 
statements caused by heterogeneity 
of variance, provided the sizes of the 
samples are the same. This applies 
to the Fas well as the ¢ test. If how- 
ever, the sample sizes are different, 
major errors in interpretation may re- 
sult if normal curve thinking 1s used. 


om Identical Non-Normal 
(Equal Variances) 

d to violate the 
that of nor- 
from whic 


Sampling fr 
Distributions: 

Let us now procee 
other main assumption, 
mality of distribution 
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sampling takes place. At this time 
we will consider the £ distributions 
arising when both samples are taken 
from the same non-normal distribu- 
tion. The distributions shown here 
and all subsequent ones, are based 
upon only 1000 t's, and hence will 
exhibit somewhat more column to 
column fluctuation than the preced- 
ing distributions. 

Figure 7 compares the theoretical 
1 distribution and the empirical dis- 
tribution obtained from two samples 
of Size 5 from the exponential distri- 
bution—Z(0, 1)5-E(0, 1)5. The fit 
is fairly close, but the proportion of 
cases in the tails seems less for the 
empirical distribution than for the 
theoretical. By count, 3.1% of the 
obtained #’s exceed the nominal 5% 
values—that is, the test in this case 


seems slightly conservative. both 
raised to 15 (distri- 


sample sizes are 
bution not shown here); the corre- 


sponding percentage 0 
4.0%. While this is probably not an 
appreciably better fit than for sam- 
ples of Size 5, we shall see later that 
there are theoretical reasons to sus- 
pect that increasing the sample size 
should better the approximation o 
the empirical curve by the theoretical 
no matter what the parent popula- 
tion may be. 

If both samples are of Size 5 from 
the same rectangular distribution— 
R(0, 1)5-R(0, 1)5—the result is as 


FREQUENCY 


E] q 4 


ae a 
VALUE OF t 


n of t's from 
d theoretical distribu- 


1G. T: Empirical distributio! 


F 
E(0, 1)5-E(0, 1)5 an 
tion with 8 df. 
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depicted in Fig. 8. The fit of theoreti- 
cal curve to empirical data here is as 
good as any thus far observed. The 
percentage of obtained ?#’s exceeding 
the 5% values is 5.1% in this particu- 
lar case. For the case in which the 
sample sizes are both 15 (not shown 
here), the fit is equally good, with 
5.0% of the cases falling outside of the 
nominal 5% bounds. 


Sampling from Non-Normal Distribu- 
tions: (Unequal Variances) 


We may assume that if the vari- 
ances are unequal, and at the same 
time the sample sizes are different, 
the resulting distributions from non- 
normal populations will be affected 
in the same way as the distributions 
derived from normal populations, 
and for the same reasons. These cases 
will not be considered. 

If sampling is in sizes of 5 from 
two exponential distributions, one 
with a variance of 1, and the other of 
4, a skewed distribution of obtained 
t's emerges (not shown here). We 
Shall discover that a skewed distri- 
bution of #’s generally arises when 
the sampling is from distributions 
which are different in degree of skew- 
ness or asymmetry. (For an explana- 
tion, see discussion of E(0, 1)5- 
N(O, 1)5 below.) Apparently, the 
effect of increasing the variance of 
the exponential distribution as in the 
Present case—E(0, 1)5-E(0, 4)5—is 


ve sample means 
distribution with 


as not tested with 
larger samples, but we shall see when 


comparing exponential and normal 
distributions that an increase in the 


FREQUENCY 
è 


SSS = a 
2 “1 o 1 2 3 4 
VALUE of t 


Fic. 8. Empirical distribution of t's from 
R(O, 1)5-R(O, 1)5 and theoretical distribu- 
tion with 8 df. 


sample size decreases the skew of the 
obtained ¢ distribution there, Theo- 
retically, this decrease should occur 
in almost all cases, including the pres- 
ent one. 

The result is much less compli- 
cated if, while variances are differ- 
ent, the sampling is from symmetrical 
rectangular distributions—R(0, 1)- 
R(0, 4)5. For this small sample si 
ation, (not illustrated), there occurs 
a distribution of obtained ?’s having 
7.1% of the values exceeding the 
nominal 5% points. This is roughly 
the same magnitude as the corre- 
sponding discrepancy from normal 
distributions, For the normal, it will 
be recalled that an increase of the 
sample sizes to 15 decreased the ob- 
tained percentage to 4.9%, 
is no reason to be 
the size of the 


There 
lieve that increasing 


rectangular samples 
would not have the same effect. 


However, time did not permit the de- 
terminaiton of this distribution. 


Sampling from Two Diferent Distri- 
butions 


By drawing the first sample from 
a distribution having one shape, 
and by drawing the second from a 
distribution having another shape 
(other than shape differences arising 
from heterogeneity of variance), yet 
another way has been found to do 
violence to the integrity of the as- 


tu- , 


e 


| 
T 
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sumptions underlying the ¢ test. Per- 
haps the least violent of these hap- 
penings is that in which at least one 
of the populations is normal. 

When one sample is from the ex- 
ponential distribution and the other 
from the normal, the interesting Te- 
sult shown in Fig. 9 occurs. This is 
the small sample case—E(0, 1)5- 
N(0,1)5. It will be recalled that 
for skewed distributions the mean 
and median are at different points. 
In the exponential distributions, for 
example, the mean is at the 63rd cen- 
tile. If samples from the exponential 
distribution are small, there will be a 
tendency for the sample mean to be 
less than the population mean, obvi- 
ously since nearly two thirds of the 
scores are below that mean. Since 
the population mean of the present 
distributions is 0, the result will be a 
preponderance of negative sample 
all samples. If the other 
en from a symmetrica 
which would tend to 
y positive as negative 
Iting distribu- 
d not balance 


produce as man 
sample means, 
tion of o i 
about its zero point, at 2 
acerbated by small samples. In Fig. 
9, 7.1% of the obtained cases fall out- 


side the 5% limits, Wit 
lying in the skewe tail. c 
increasing the sampte size to 15 18 to 
normalize the distribution r 
ably; the resulting curve, Fig. 10, 1s 


8 
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of t's from 
1 distribu- 


Fic. 9. Empirical distribution 
EO, 1)5-N0, 1)5 and theoretica 


tion with 8 df. 


FREQUENCY 


VALUE OF t 


Fic. 10. Empirical distribution of t's from 
E(0, 1)15-N(0, 1)15 and theoretical distribu- 
tion with 28 df. 
mated by the £ dis- 
tribution. One of the tails, however, 
does contain a disproportionate share 
of the cases, 4.2% to 0.9% for the 


other tail, or a 


outside the nomina 
ertheless, the degree to which the 
theoretical and empirical distribu- 
tions coincide under these conditions 


is striking. seems likely that if 
both samples were each of Size 25, 
the resulting sample distribution of 
t's would be virtually indistinguish- 
able from the ¢ distribution for 48 df, 
or the next best thing, the normal 
curve itself. To test this hypothesis, 
an additional empirical £ distribution 
ple sizes of 25 from these 
same exponential and normal popula- 
tions was obtained (not shown here). 
The results nicely confirm the pre- 
Comparison with the 
usual 5% values reveals 4.6% of 
the empirical ts surpassing them. 
Whereas with the smaller samples the 
ratio of t's in the skewed tail to those 
in the other tail is roughly 80:20, the 
corresponding ratio for the larger 
sample case is 59:41. Clearly, the in- 
crease in sample sizes has tended to 
normalize the distribution of t's. 
For these conditions, involving 
rather drastic violation of the mathe- 
matical assumptions of the test, the 
t test has been observed to fare well 
with an adequate sample size. Suc 


sumption. 
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a state of affairs is to be expected 
theoretically. By invoking a few 
theorems of mathematical statistics 
it can be shown that if one samples 
from any two populations for which 
the Central Limit Theorem holds, 
(almost any population that a psy- 
chologist might be confronted with), 
no matter what the variances may be, 
the use of equal sample sizes insures 
that the resulting distribution of (’s 
will approach normality as a limit. 
It would appear from the present re- 
sults that the approach to normality 
is rather rapid, since samples of sizes 
of 15 are generally sufficient to undo 
most of the damage inflicted by vio- 
lation of assumptions. Only in ex- 
treme cases, such as the last which 
involves distributions differing in 
skew, would it seem that slightly 
larger sizes are prescribed. Thus it 
would appear that the ¢ test is func- 
tionally a distribution-free test, pro- 
viding the sample sizes are sufficient- 
ly large (say, 30, for extreme viola- 
tions) and equal. 

The distributions arising when 
sampling is from the normal and the 
rectangular distributions—N(0, 1)5- 
RO, 1)5 and N(0, 1)15-R(O, 1)15 
—would further tend to substantiate 
this claim. The respective percent- 
ages exceeding the 5% nominal 
values are 5.6% and 4.6% from 
the empirical distributions for these 
cases, the distributions of t's being 


t exponential and 
rectangular distributions. 


tribution (not shown) i 
with the effect of increase of sample 
size from 5 to 15 to cut down the 
skew and to decrease the percentage 
of cases falling outside the theoretical 
5% values from 6.4% to 5.6%. For 
those cases falling outside the nomi- 


nal 5% values, the ratio is 79:21 for 
the smaller samples. This is changed 
to 69:31 for the sample size of 15. 
Here again it would seem that larger 
sample sizes would be required to in- 
sure the validity of probability state- 
ments utilizing the ¢ distribution as 
a model. 

The results of the total study are 
summarized in Table 1 which gives 
for each combination of population, 
variance, and sample size (a) the per- 
centage of obtained z's falling outside 
the nominal 5% probability limits of 
the ordinary ¢ distribution, and (b) 
the percentage of obtained ?’s falling 
outside the 1% limits, The combina- 
tions are represented symbolically as 
before. The table is divided into two 
parts, the first part Presenting infor- 
mation on the empirical distributions 
which are intrinsically symmetrical, 
The second part is based upon the in- 
trinsically nonsymmetrical distribu- 
tions, additional information in this 
section of the table being the percent- 
age of obtained ?’s falling in the larger 
of the tails. The Percentage for the 
smaller tail may be obtained by sub- 
traction of the Percentage in the larg- 
er tail from the total. 

ertain implications of the table 
should be discussed. In the Norton 


gnificance ley- 
els of 1% and 1% than appeared 
The inclusion in 
2 centages of ob- 
tained t's falling outside the nominal 
1% values makes possible the com- 
Parison of the 1% and 5% results. 
The 1% values seem to be approxi- 
mately what would be expected con- 
sidering that sampling fluctuations 
are occurring. It was not felt feasible 
to determine the results for the .1% 
level since with only 1000 or 2000 
Cases the number of obtained ?’s fall- 
ing outside the prescribed limits was 


negligible in most cases. It is pos- 
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ake however, that the distortions in 
the apparent level of significance are 
more drastic for the smaller æ values. 

All the results and discussion 


have been limited thus far to the 
two-tailed ¢ test. With notable ex- 
we have 


ceptions, the conclusions 
reached can be applied directly to the 
one-tailed ¢ test as well. The ex- 
ceptions involve those distributions 
which are intrinsically asymmetric 
(see Table 1). In these distributions 
a preponderance of the obtained ?’s 
fall in one tail. Depending upon the 
particular tail involved in the one- 
tailed test the use of t should produce 
too many or too few significant re- 
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sults when sampling is from a combi- 
nation of populations from which an 
asymmetric ¢ distribution is expected. 
It seems impossible to make any 
simple statements about the behay- 
jor of the tails in the general case of 
ic ż distribution except to 
such distributions are ex- 
er the skew of the two 
parent populations is different. The 

must determine for 


experimenter 
each particular instance the direction 


of skew of the expected distribution 
and act accordingly. Table 1 gives 
for the intrinsically asymmetric dis- 
tributions the total percentage of ob- 
tained t's falling outside the theo- 
retical 5% and 1% limits and the per- 
centage in the larger tail. From 
these values can be assessed the ap- 
proximate magnitude of the bias in- 
curred when a one-tailed test is used 


in specific situations. 


Discussion AND CoNcL 
Having violated a number of as- 
i derlying the t test, and 


hat, by and large, such vio- 
è a minimal effect on 


asymmetr 
say that 
pected whenev 


USIONS 


robust test in 


the word. T 
by Box (1953) to characterize sta- 


tistical tests which are only inconse- 
quentially affected by a violation 0 
the underlying assumptions. 
statistical test is in part a test of the 
assumptions upon which it is based. 
For example, the null hypothesis oł a 
particular test may be concerned 
with sample means. If, however, the 
assumptions underlying the test are 
not met, the result may be “signifi- 
cant” even though the population 
means are the same. If the statisti- 
cal test is relatively insensitive to 
violations of the assumptions other 
than the null hypothesis, and, hence, 
if probability statements refer pri- 
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marily to the null hypotheses, it is 
said to be robust. The £ and F tests 
apparently possess this quality to a 
high degree. 

In this particular context, an im- 
portant example of a test lacking 
robustness is Bartlett's test for 
homogeneity of variance (Bartlett, 
1937). Box (1953) has shown that 
this test is extremely sensitive to 
non-normality and will under some 
conditions be prone to yield “signifi- 
cant” results even if variances are 
equal. For example, Box tables a 
number of exact probabilities of ex- 
ceeding the 5% normal theory sig- 
nificance level in the Bartlett test 
for various levels of Az, the kurtosis 
parameter, for different quantities of 
variances being compared. As an ex- 
treme case, if \,»=2 (i.e., a peaked 
distribution) with 30 variances being 
tested, the probability of rejecting 
the hypothesis at the nominal -05 
level is actually .849. IfM = —1 (i.e, 
a flat distribution), the probability is 
00001. Note that in both these 
cases, all variances are actually 
equal. Box, realizing that in the case 
of equal sample sizes the analysis of 
variance is affected surprisingly little 
by heterogeneous variance and non- 
normality, concludes that the use of 
the nonrobust Bartlett test to “make 
the preliminary test on variances is 
rather like putting out to sea in a 
rowing boat to find out whether con- 
ditions are sufficiently calm for an 
ocean liner to leave port!” Appar- 
ently, as reported in this same arti- 
cle, other commonly used tests for 
evaluating homogeneity are subject 
to the same weakness. 

We may conclude that for a large 
number of different situations con- 
fronting the researcher, the use of 
the ordinary ¢ test and its associated 
table will result in probability state- 
ments which are accurate to a high 
degree, even though the assumptions 


of homogeneity of variance and nor- 
mality of the underlying distribu- 
tions are untenable, This large num- 
ber of situations has the following 
general characteristics: (a) the two 
sample sizes are equal or nearly so, 
(b) the assumed underlying popula- 
tion distributions are of the same 
shape or nearly so. (If the distribu- 
tions are skewed they should have 
nearly the same variance.) If these 
conditions are met, then no matter 
what the variance differences may 
be, samples of as small as five will 
produce results for which the true 
probability of rejecting the null hy- 
pothesis at the .05 level will more 
than likely be within .03 of that level. 
If the sample size is as large as 15, 
the true probabilities are quite likely 
within .01 of the nominal value. 
That is to say, the percentage of 
times the null hypothesis will be re- 
jected when it is actually true will 
tend to be between 4% and 6% when 
the nominal value is 5%. 

If the sample sizes are unequal, 
one is in no difficulty provided the 
variances are compensatingly equal. 
A combination of unequal sample 
sizes and unequal variances, how- 
ever, automatically produces inaccu- 
rate probability statements which 
can be quite different from the nomi- 
nal values. One must in this case re- 
sort to different testing procedures, 
such as those by Cochran and Cox 
(1950), Satterthwaite (1946), and 
Welch (1947). The Welch procedure 
is interesting since it has been ex- 
tended by Welch (1951) to cover the 
simple randomized analysis of vari- 
ance which suffers the same defect as 
the ¢ test when confronted with both 
unequal variance and unequal sam- 
ple sizes. The Fisher-Behrens pro- 
cedure suggested by many psycholog- 
ically oriented statistical textbooks 
has had its validity questioned (Bart- 
lett, 1936) and, hence, is ignored by 
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some statisticians (e.g., Anderson & 
Bancroft, 1952, p. 82). 

If the two underlying populations 
are not the same shape, there seems 
to be little difficulty if the distribu- 
tions are both symmetrical. If they 
differ in skew, however, the distribu- 
tion of obtained t's has a tendency it- 
self to be skewed, having a greater 
percentage of obtained #'s falling out- 
side of one limit than the other. This 
may tend to bias probability state- 
ments. Increasing the sample size 
has the effect of removing the skew, 
and, due to the Central Limit Theo- 
rem and others, the normal distribu- 
tion is approached by this maneuver. 
By the time the sample sizes reach 
25 or 30, the approach should be close 
enough that one can, in effect, ignore 
the effects of violations of assump- 
tions except for extremes. Since this 
is so, the ¢ test is seen to be func- 
tionally nonparametric or distribu- 
tion-free. It also retains its power in 
some situations (David & Johnson, 
1951). There is, unfortunately, no 
guarantee that the ¢ and F tests are 


uniformly most powerful tests. It is 


possible, even probable, that certain 


REFER 


s Lo E Bancrort, T. A. Sta- 


ANDERSON, R 
research. New York: 


tistical theory in 
McGraw-Hill, 1952. 

BARTLETT, M. S. The effect of non-normality 
on the t-distribution. Proc. Camb. Phil. 
Soc., 1935, 31, 223-231. 

BARTLETT, M. S. The information available 
in small samples. Proc. Camb. Phil. Soc., 
1936, 32, 560-566. 

BARTLETT, S, Properties of 


and statistical tests. Proc. Roy. 


don), 1931, 160, 268-282. _ 
Box, G. E. P. Non-normality and tests on 
variances. Biometrika, 1953, 40, 318-335. 
Box, G. E. P. Some theorems on quadratic 
forms applied in the study of analysis of 
variance problems, I. Effect of inequality 
of variance in classification. 


the one-way 
Ann. of math. Statist- 1954, 25, 290 -302. 
(a) 


Box, G. E. P. Som 


sufficiency 
Soc. (Lon- 


e theorems on quadratic 


NS UNDERLYING THE t TEST 63 


of the distribution-free methods are 
more powerful than the ¢ and F tests 
when sampling is from some unspeci- 
fied distributions or combination of 
distributions. At present, little can 
be said to clarify the situation. Much 
more research in this area needs to be 
done. 

Since the ż and F tests of analysis 
of variance are intimately related, it 
can be shown that many of the state- 
ments referring to the ¢ test can be 
zed quite readily to the F 


generali 
test. In particular, the necessity for 
if variances are 


equal sample sizes, 1 
unequal, is important for the same 


reasons in the F test of analysis of 
variance as in the # test. A number 
of the cited articles have demon- 
strated both mathematically and by 
means of sampling studies that most 
of the statements we have made do 
apply to the F test. It is suggested 
that psychological researchers feel 
free to utilize these powerful tech- 
niques where applicable in a wid- 
er variety of situations, the pres- 
ent emphasis on the nonparametric 
methods notwithstanding. 
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Since the advent of projective 
techniques, psychologists have de- 
voted considerable energy to the cre- 
ation and utilization of various meth- 
ods for assessing the “deeper layers” 
of personality. The impressive results 
often obtained with these instru- 
ments have led to a proliferation of 
projective methods, most of which 
are greeted with enthusiasm and vali- 
dated by endorsement. The early, 
uncritical acceptance of projective 
testing produced studies which were 
mainly concerned with the character- 
istics of such clinical populations as 
delinquents, psychotics, and organ- 
ics. In general, the assumption was 
made that a projective test was as 
single-minded as the X ray (Frank, 
revealing information only 
without in any 


who adminis A 
od of administration, or the situation 


in which it was use 
Cronbach (1956), “Test research has 
been dominated 
view that the test i 
subject’s responses 
nonpersonal stimulus” (p. 175). 

However, with growing sophistica- 
tion regarding the nature of psycho- 
logical testing, a number of writers 
have explicitly commented on the in- 
fluence in projective testing of factors 
other than the S's personality. One 
of the first to delineate the sub- 
jective factors in Rorschach testing 
was Schactel (1945) who described 
four common elements in the Ror- 
schach situation: the relationship of 
the E and S; the assign 


ment of the 
task by the Æ to the S; the E's need 


pee cee 
qua e task, such 
as the ambiguity or the lack of fa- 
miliarity with the stimuli. Miller 
(1953), Sarason (1954), and Luchins 
(1947) have also indicated the subtle 
ways in which subjective forces may 
influence the course of a projective 
testing situation. 

It is the purpose of this paper to 
review the considerable evidence re- 
garding situational and interpersonal 
influences in projective testing. Ex- 
periments dealing with the effects of 
these factors on interviewing or in- 
telligence testing and those studies 
concerning the effects of psychother- 
apy, psychosurgery, and individual 
ys. group tests will not be reviewed, 
since these issues are considered to 
be part of essentially different prob- 
lems. Four arbitrary, overlapping 
categories will be used to present 
these studies: method of administra- 
tion, the testing situation, examiner 
influence, and subject influence. 


THE INFLUENCE OF METHOD 
or ADMINISTRATION 


Instructions to make a good or bad 
impression. The ability of projective 
tests to withstand attempts by Ss 
to disguise or alter their “real” re- 
sponses has been investigated sev- 
eral times. The usual procedure in 
these studies is to test the same Ss 
several times under varying instruc- 
tions; comparisons are then made be- 
tween the test responses produced un- 
der standard instructions with those 
yielded by experimental instructions. 

Fosberg (1938, 1941, 1943) has re- 
ported on the process of trying to 
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produce a good or bad impression on 
the Rorschach. In one study (Fos- 
berg, 1938) a husband and wife were 
each administered the Rorschach 
under four sets of instructions. De- 
spite the instructions to create a 
given impression, it was concluded 
that the Ss were unable to avoid 
revealing basic aspects of their per- 
sonality, the psychograms on the four 
examinations remaining essentially 
the same. In a later study (Fos- 
berg, 1941), the Rorschach was given 
four times under different instruc- 
tions to 25 male and 25 female Ss. 
A special experimental group of 16 
Ss took the fourth examination un- 
der instructions to look for particular 
determinants. 
pared group means produced under 
the different instructions he found 
few consistent differences, There was 
little change in the test as a whole, 
only the content of the responses 
showing marked changes. One reason 
for the failure of these instructions to 
produce differences in responses is 
that each S defined for himself the 
manner in which to deceive the E, so 
that six Ss increased their responses 
in order to make a good impression 
while four increased their responses 
in order to make a poor impression. 
While there was no consistency 
among all Ss on how to create an 
artificial image for the E, most Ss 


felt that they could falsify their reac- 
tions by adop 


special impo 
the fact that 


ornery, two pro- 
ceeded very slowly, and four paid 


little attention (Fosberg, 1943), 
Fosberg’s studies were essentially 
repeated by Carp and Shavzin (1950) 
who also found that taking the Ror- 
schach under instructions to make a 


When Fosberg com- 
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good impression (“you are in a state 
hospital and the results on this test 
may help get you out”) and under 
instructions to make a bad impres- 
sion (“you are to be drafted for the 
Army and the results on this test 
may help keep you out”) produce no 
significant group differences (except 
for the z score, significant at the .05 
level). “This does not mean, how- 
ever, that no changes were produced. 
The data clearly showed the differ- 
ences. But the direction taken was so 
diverse, among the individual Ss, 
that they were balanced out in the 
analysis” (p. 232). The authors di- 
rectly challenged Fosberg’s conclu- 
sions that the Rorschach could not 
be manipulated by the Ss. “On 
the contrary, this study shows that 
there are some subjects who can ma- 
nipulate their responses, who can 
vary their Personality picture as re- 
flected by the Rorschach, under in- 
structions to make ‘good’ or ‘bad’ 
impressions” (p, 233). 

Weisskopf and Dieppa (1951) ad- 
ministered three cards of the TAT 
to hospitalized Psychoneurotic vet- 
erans, giving the standard instruc- 
tionsin oneadministration, asking the 
Ss to give the best possible impres- 
sion on another administration and 
the worst possible impression on the 
third administration. Of the nine 
dimensions rated by judges, five 
showed significant differences as a 
function of the instructions. When 
the Ss tried to give their worst im- 
Pression, they were rated as less 
well-adjusted, more hostile, less will- 
ing to conform, and more sponta- 
neous. Wallon and Webb (1957) 
asked naval aviation cadets to take 
the Rosenzweig P-F test and a sen- 
tence completion test under several 
Variations of instructions and test 
Structure. One stoup took the P-F 
test in a multiple choice form, 
another took it in standard form, 
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while a third group was told in taking 
the test to try to make the best im- 
pression. It was concluded that as 
the test became less ambiguous, the 
results more closely resembled re- 
sponses produced under instructions 
to fake. 

Instructions emphasizing particular 
determinants or locations. In the proc- 
ess of studying suggestibility, Coffin 
(1941) demonstrated how S's set may 
influence his responses to the Ror- 
schach test. The Ss were first asked 
to read a fictitious article by a ‘“Har- 
vard professor” describing how pro- 
fessional men usually saw whole re- 
sponses, while business men saw ani- 
mals, skilled laborers saw inanimate 
objects and WPA employees saw de- 
tails. A second group of Ss read the 
article that now described profes- 
sional men as seeing details, business 
men inanimate objects, etc. Follow- 
ing the reading of this article each 
S was administered six Rorschach 
cards. The results clearly showed the 
influence of the suggestion on the re- 
sponses, each group tending to re- 
spond in the same direction as the 
socially acceptable norms. Appar- 
ently the suggestion sets up & deter- 
mining tendency operating upon the 


observer’s perceptual and imaginal 
This acted to direct the 


processes. 
‘search’ ” (P- 62). 
In a better controlled study 


Abramson (1951) equated two groups 
of college students on the basis of 
W, D, and Dd responses to the first 
administration of the Rorschach. 


One group of 
successful business and 

saw whole responses, 
men sa eo 


second group was instructe x 
these men saw detail responses. 4 


consequence O the difference in in- 
the two groups differed 


test not only int 
and detail resP 


other determinants (F%, FM, m 
Hd, Ad) as well, although there were 
no significant differences between 
groups on the first test. Evidently es- 
tablishing a set for area will also af- 
fect those determinants dependent 
upon the area of the blot. The evi- 
dence of Keyes (1954) also supported 
the notion that the number of whole 
responses can be influenced by special 
instructions. 

Hutt, Gibby, Milton, and Pott- 
harst (1950) and Gibby (1951) in- 
vestigated the effects of instructing 
Ss to pay particular attention to 
specific aspects of the Rorschach 
blots. In each study various parts of 
the blots were emphasized to the 
Ss after the standard administra- 
tion of the test, but prior to the ex- 
perimental administrations. A study 
of test-retest reliability of the Ror- 
schach under these conditions showed 
certain determinants to be more 
stable and less resistant to change 
than others. “What appears to be 
y the individual per- 


crucial is hov 
ceives the total test situation. If we 


do not know this, we are likely to 
make serious errors in interpreta- 
tion” (p. 185). 

After first administering the Ror- 
schach under standard instructions, 
Fabrikant (1954), in his instructions 
to the experimental group, stressed 
movement, color, shading, and tex- 
ture responses. In the experimental 
group, 15 Ss showed differences be- 
and second administra- 
tions in at least 3 of the 4 response 
categories, while in the control group 
only 3 Ss showed significant changes 


in at least 3 categories. 


tween first 


e purpose of the test- 
m the study of Henry 


and Rotter (1956), who simply told 
their experime! 
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undergraduates what most college 
students have already presumably 
learned about the Rorschach test 
from television, the movies, and 
Life magazine: “This is a test to 
discover serious emotional disturb- 
ances.” It was found that the experi- 
mental group gave fewer responses 
(at the .01 level), more good form 
responses (.05 level), more popular 
responses (.05 level) and more animal 
responses (.05 level) than a control 
group given a standard administra- 
tion. It was evident that making 
explicit the purpose of the test pro- 
duced more constriction and more 
attempts to be safe than leaving the 
purpose unstated. 

A study by Calden and Cohen 
(1953) investigated the influence of 
both ego involvement and instruc- 
tions regarding the nature of the 
Rorschach test. Half their senior 
high school Ss were given ego-in- 
volving instructions and half were 
given neutral instructions; one-third 
of the Ss were told that the Ror- 
schach tested intelligence, another 
third that it tested “nervousness,” 
and the last group that it measured 
imagination. An analysis of variance 
computed for 27 selected variables 
showed 19 differences significant at 
the .05 level. In general, the resulting 
personality pattern that emerged 
from the intelligence test instruc- 
tions resembled the same constricted, 
safe picture found by Henry and 
Rotter, form and animal responses 
increasing, movement responses de- 
creasing. ‘Needless to say, predic- 
tions based on ‘blind’ interpretations 
of the Rorschach Protocol, without 
knowledge of the testing situation or 
the S's reactions to the testing, are 
so much more fallible when viewed 
in the light of the results of this 
study” (p. 308-9), 

The TAT has also been used to 
study the effects of informing Ss 


of the purpose of the test (Sum- 
merwell, Campbell, & Sarason, 1958). 
Four different groups were used, each 
group receiving one of the following 
instructions regarding the purpose of 
the test: (a) the usual Murray in- 
structions, (b) intelligence test in- 
structions, (c) projective test in- 
structions, (d) neutral instructions, 
It was found that the neutral in- 
structions produced significantly dif- 
ferent emotionally toned stories than 
any of the other three groups, with 
story outcomes significantly different 
from the standard Murray instruc- 
tions and the projective test instruc- 
tions. 

The manipulation of time of re- 
sponse. In giving the group Ror- 
schach to 10 Ss under conditions of 
long time exposure (3 min.) and short 
time exposure (10 sec.), Weisskopf 
(1942) noted that the same person- 
ality pattern emerged under both 
conditions of administration, Un- 
fortunately, adequate experimental 
controls were not employed. A more 
rigorous procedure was utilized by 
Siipola and Taylor (1952) who gave 
the Rorschach test to experimental 
Ss seated in front of a noisy auto- 
matic timer which recorded reaction 
time of the first response. It was 
concluded that the pressure induced 
by the timer resulted in normal Ss 
behaving under stress much as dis- 
turbed patients do in a nonstress 
situation. “The moral of all this 
seems to be that the projective proc- 
ess, like any other psychological proc- 
ess, is not immune to the influences of 
the specific conditions under which it 
operates” (p. 46). 


THE INFLUENCE OF THE 
TESTING SITUATION 


The designs for the studies investi- 
gating the effects of varying testing 
conditions take several forms. The 
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most rigorous of these utilizes a con- 
trol group that has been given two 
administrations of the test to con- 
trast with the experimental group 
which had experienced the special 
conditions between the first and 
second testing. If the projective test 

ermits, some investigators prefer to 
counterbalance the order of presenta- 
tion of the particular cards used, 
necessitating at least two experimen- 
tal groups and two control groups. 
Another frequent design does not 
utilize a control group, the only 
comparison made being that of the 
first administration of the test with 
the second administration, with all 
differences between administrations 
assumed to be a function of the inter- 
vening conditions. A third procedure 
consists of administering a single test 
to groups known to differ on a par- 
ticular dimension; all differences in 
test results are then attributed to the 
central, identifiable difference be- 
tween the groups. 

Stress. The most careful, syste- 
matic effort to induce stress was that 
of Lindzey (1950a, 19500), who frus- 


trated his experimenta 
jecting them to 10-12 hours of food 


deprivation, inducing them to drink 
a large quantity of water and then 

ing them from urinating for 
ely three hours, taking a 
blood sample in a painful way with 
a spring lancet, and by forcing them 


to fail in a group situation. As a 
f these conditions the 


ure Frustration Test 


showed a significant increase 1N 
responses (Lindzey, 
ions regarding 


TAT, 11 were In the 


expect! pet 
nfirmed at the - 
theses co ewe Wer- 


man, 1955). Th 


the Rosenzweig 
studied by French (1950) who gave 


students in a social psychology class 
erroneous grades on an examination 

Half the students who earned an A a 
B were given C or D, while half 
those earning a C or D were given an 
A or B. On the P-F test given im- 
mediately after the grades were re- 
turned the good students given the 
poor grades (the stress group) did 
not differ from the good students who 
were assigned their correct grades. 
However, the poor students given 
the erroneously high grades showed 

fewer intropunitive ego-defensive re- 

sponses than the poor students given 

their correct grade. 

Eichler (1951) used an elaborate 
device resembling an electric chair to 
seat his Ss while taking the Ror- 
schach. They were made to wear a 
helmet which looked as if it could 
conduct electricity and were told 
that while taking the test they would 
be given shock, “the longer the time 
interval that elapses without the 
receipt of shock the more intense 
the next shock will be.” On the basis 
of an administration of the Behn- 
Rorschach test, the experimental 
group was matched on five variables 
with a control group that took the 
second Rorschach under standard 
conditions. Judges who made a blind 
global rating of the Rorschach proto- 
cols found a significant difference in 
anxiety between the two groups. On 
15 anxiety indicators, however, they 
found that only 4 reflected a signifi- 
cant difference between groups while 
3 additional variables did not reach 
statistical significance but were in 
the predicted direction. 

Less dramatic forms of frustration 
seem also to be effective in demon- 


strating how projective devices may 


reflect pretest conditions. Crandall’s 
(1951) experimental Ss took tests of 
ical skills between administra- 
f the TAT and were informed 


tions 0 d 
t the “norm. 


that they had not mee 
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notic trance states. Hypnosis has 
also been used to study age regression 
(Bergmann, Graham, & Leavitt, 
1947). The test responses of a 20- 
year-old S, given the Rorschach for 
the eight alternate years between 3 
and 20, seemed to reflect the various 
stages in the development of his per- 
sonality. 

Two studies used the Rorschach 
with Ss who had been instructed un- 
der hypnosis to feel hostile toward 
the examiner (Counts & Mensh, 1950; 
Pattie, 1954). To illustrate the diffi- 
culties involved in using hypnosis for 
this purpose, the conclusions in one 
study were different from those in the 
other. Counts and Mensh (1950) 
found that the hostility to the E 
which appeared during the psychi- 
atric interview was not reflected in the 
Rorschach, while Pattie (1954) could 
divide his 14 Ss into 3 groups, one 
of which (N=8) showed a two-fold 
increase in hostile content in the 
posthypnotic Rorschach. The num- 
ber of white space responses did not 
increase under the hostile instruc- 
tions. A considerable degree of indi- 
vidual differences was also found by 
Arluck and Balinsky (1953) who 
used as the hypnotic suggestion “you 
are mature, warm and outgoing,” 
etc. For some Ss the responses on 
both the Rorschach and a sentence 
completion test showed numerous dif- 
ferences with prehypnotic test rec- 
ords, but for others the two sessions 
produced highly similar results. 

Qualitative and quantitative dif- 
ferences following the hypnotic in- 
duction of an elated mood and a 

despondent mood were found on a 
word association test (Fisher & Mar- 
row, 1934). The reactions times were 
longest for unpleasantly toned words 
and were fastest for neutral words. 
Special training and experience. 
Several experiments have investi- 
gated the influence of perceptual 


training upon Rorschach scores. 
Knopf (1954) provided pre-Ror- 
schach perceptual training in find- 
ing animal or animal parts for one 
group of Ss while the second group 
watched a film on the nature of color. 
He concluded that the over-all pic- 
ture of the personality remained bas- 
ically unchanged. Kurtz and Riggs 
(1954) similarly found no differences 
in group Rorschach scores in Ss who 
had first been exposed to a visual set 
to perceive animals. “So far as this 
study is concerned, Rorschach work- 
ers remain secure in the assumption 
that implicit peripheral sets will not 
influence test results to any appreci- 
able extent” (p. 469). Nor did Nor- 
man, Leverant, and Redlo (1952) 
find that Rorschach scores were al- 
tered by having one group of Ss first 
look at colored food ads while another 
group looked at pictures of people in 
motion. Evidence that Perceptual 
training can influence Rorschach per- 
formance has been reported by Keyes 
(1954) and Leventhal (1956). Sub- 
jects trained on stimuli similar to the 
Street Gestalt pictures produced an 
increase in the number of whole re- 
sponses on the group Rorschach 
(Keyes, 1954), Training on the Gott- 
schaldt figures before an administra- 
tion of the group Rorschach resulted 
in lower W and Z scores (Leventhal, 
1956). 

Giving children a “gratifying” ex- 
perience prior to testing seemed to 
improve performance on the Draw- 
A-Person test (Reichenberg-Hackett, 
1953). Coleman (1947) showed a 
neutrally toned film to children the 
evening before they took the second 
half of the TAT. Of the 370 stories 
told after the film only one clearly re- 
flected the content of the film. Kim- 
ble (1945) gave the Rorschach test 
twice, once in the standard office pro- 
cedure and once in the college cafe- 
terias with at least two other people 
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present. The test results were quite 
similar in the two situations, the 
most important difference occurring 
in the experience balance, more color 
responses being given in the social 
situation. Gibby, Stotsky, and Miller 
(1954) concluded that the experience 
of taking a psychological test (Ben- 
der-Gestalt, TAT, Wechsler-Belle- 
vue, or Goldstein-Scheerer) immedi- 
ately before an administration of the 
Rorschach did not alter any of 11 se- 
lected Rorschach variables. 
Deprivation or arousal of needs. 
The effects of hunger on associations 
to ambiguous stimuli have received 
the attention of three investigators. 
School children showed more food as- 
sociations on a word association test 
shortly before a regular meal than on 
the test given after the meal (San- 
ford, 1936). Undergraduates who 
had fasted for 24 hours before tak- 
ing five projective tests gave more 
food responses than did Ss who had 
abstained from food for shorter pe- 
riods of time. The increase was nota 
straight line function of time, how- 
ever, the most hungry Ss giving only 
slightly more food responses than 
the groups examined near the end 
of a normal eating cycle (Sanford, 
1937). Atkinson and McClelland 
(1948) were able to derive a food 
score on TAT stories that could dif- 
ferentiate reliably among groups de- 
rived of food for 1, 4, or 16 hours. 
While there was no increase in food 
imagery in these stories, there was a 
marked increase in food deprivation 
themes. While the TAT may be able 
to reflect a state of food deprivation 
the report of Franklin and Brozek 
(1949) indicated that the Roe 
P-F test responses of a group Ot con- 


scientious objectors did not pete 
significantly from a period © ean 
starvation toa state of nutr! io 


habilitation. : 
Two studies investigated 


the ef- 


fects of motor inhibition on the pro- 
duction of movement responses in 
the Rorschach test. Korchin, Meltz- 
off, and Singer (1951) restricted nor- 
mal motor behavior in their Ss by 
having them write a standard phrase 
as slowly as possible. Singer, Meltz- 
off, and Goldman (1952) reduced the 
motor behavior of Ss by requesting 
them to stand for five minutes with- 
out moving. In each experiment, the 
Ss denied motor activity gave more 
movement responses than a control 
group. However, Singer et al. (1952) 
also found that increasing motor be- 
havior (by having Ss engage in five 
minutes of vigorous calisthenics) did 
not reduce the number of movement 
responses. 

The relationship of experimentally 
aroused needs on projective test re- 
sponses has also been investigated. 
When the need for achievement was 
induced by means of ego-involving 
instructions, stories of college men 
increased in achievement imagery 
themes and themes of instrumental 
acts and attitudes related to achieve- 
ment (McClelland, Clark, Roby, & 
Atkinson, 1949). The stories told by 
high school boys to pictures of male 
figures also showed an increased num- 
ber of themes related to achievement 
as a result of special achievement 
arousing conditions. These achieve- 
ment arousing conditions, however, 
were not effective in increasing 
achievement themes of either high 
school girls or college girls (Veroff, 
Wilcox, & Atkinson, 1953). 

A sociometric test given prior 
to the administration of the TAT 
seemed to increase themes related to 
the affiliation motive (Shipley & Ver- 
off, 1952; Atkinson, Heyns, & Veroff, 
1953). Freshmen who were rejected 
for membership in a fraternity gave 
more affiliation themes on the T. 
than students who were accepted into 
fraternities (Shipley & Veroff, 1952). 
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Clark (1952) attempted to raise the 
level of sexual motivation by show- 
ing Ss slides of nude females before 
administering the TAT. A control 
group was shown slides of landscapes. 
Under these conditions the group 
shown the slides of the nudes gave 
fewer manifest sexual responses and 
fewer themes of guilt than the con- 
trol Ss. However, when the experi- 
ment was repeated after all the Ss 
had participated in a beer party, the 
experimental group gave more stories 
of manifest sex and guilt than those 
Ss who had been shown the landscape 
slides. Rabin, Nelson, and Clark 
(1954) had one group of undergradu- 
ate males waiting to take the Ror- 
schach test remain in a room dec- 
orated with anatomical charts and 
surgical pictures; a second group, 
somewhat more fortunate in their 
assignment to experimental condi- 
tions, was seated in a room deco- 
rated with photographs of nude and 
seminude females; the control group 
waited in an undecorated room. 
While there was no difference be- 
tween groups in the number of anato- 
my responses, there was a significant 
difference in the number of sexual re- 
sponses, 


THE INFLUENCE OF THE EXAMINER 


While the evidence regarding situ- 
ational factors in projective testing 
has been compiled over the years, 
the studies dealing with interpersonal 
influences is of rather recent origin. 
Guilford’s 1947 study was the first in 
this area and it was three years be- 
fore the next experiments were re- 
ported. The relative neglect of this 
problem was not entirely due to lack 
of awareness of its importance, since 
McFarlane (1942) wrote that “inter- 
pretation in the hands of the clin- 
ically inexperienced, the doctrinaire, 
or the methodologically uninformed 
easily degenerates into nothing more 


but one more predictive tool—to wit, 
one which discloses the organizing 
dynamics of the interpreter rather 
than the organizing dynamics of the 
research subject” (p. 405), and Joel 
warned (1949) that “even if it were 
possible for the examiner always ac- 
tually to feel the way he pretends he 
does, we should not forget that the 
subject reacts not only to the exami- 
ner’s real attitude, but also to what 
he thinks the examiner's attitude is” 
(p. 480). Probably the greatest deter- 
rents to exploration of this question 
were the facts that the notion of 
E influence struck at the heart of 
the X-ray concept of projective tests 
and the extreme complexity and sub- 
tlety of the interpersonal testing situ- 
ation, so aptly discussed by Schafer 
(1954), made experimentation diffi- 
cult. 

The first studies tested the hy- 
pothesis that Es would differ in the 
responses they elicited from Ss by 
analyzing test records secured from 
the files of a clinic. As interest turned 
to determining which characteristics 
of Es were related to differences in 
Ss’ responses, such physical attributes 
of Es as skin color, size, and sex were 
investigated, as well as personality 
variables revealed in psychological 
tests, generally the Rorschach. The 
interaction of Æ and S has been 
studied on several occasions, either 
by controlling the warm-cold dimen- 
sion or by contrasting tests taken 
with E present with those taken 
with £ absent. 

A completely different approach to 
this problem is through the use of 
hypnosis. While no study used hyp- 
nosis primarily to investigate the 
testing relationship, most of the ex- 
periments using hypnosis do report 
that S’s test behavior varied with 
the hypnotic suggestion. Thus far 
only two studies (Gross, 1959; Wick- 
es, 1956) attempted to establish oper- 
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ant conditioning of S's verbal behav- 
ior, but this method seems so promis- 
ing that undoubtedly it will become 
more widely used in investigating ex- 
aminer influence. 

The examiner's ph ysical characteris- 
tics. The most immediately apparent 
characteristics of the E are his skin 
color, sex, and body build. Each of 
these attributes has been investigated 
for possible influence on the S’s re 
sponses. Three studies related the 
sex of E to sexual responses on the 
Rorschach test. Alden and Benton 
(1951) selected 100 Rorschach rec- 
ords from the files of a VA hospital; 
all test Ss were males, 50 of them 
tested by a female E and 50 by a 
male. There were no significant dif- 
ferences in either overt or covert sex- 
ual responses that could be attributed 
to the sex of E. Exactly contradic- 
tory results were reported by Curtis 
and Wolf (1951). Again using the 
Rorschach record of male veterans, 
comparisons were made of the overt 
and covert sexual responses given by 
386 Ss to three female and seven male 
Es. Statistically significant differ- 
ences were obtained. Rabin et al. 
(1954) found that sometimes the sex 
of Emakesa difference and sometimes 
it does not. The Ss who had waited 
for the Rorschach examination in 
a room decorated with anatomical 
charts did not differ in the number 
of anatomical responses given to the 
male and female Es, but those male 
Ss who had waited in a room deco- 
rated with pictures of nude women 
gave significantly more sexual re- 
sponses tO the male E than to the 
female. Clark (1952) found that male 
Ss gave more manifest sexual re- 
sponses and more guilt responses on 
the TAT toa male £ than to an at- 
tractive, rather seductive, female E. 

The influence of the E's size 


Draw-A-Person produc- 
nel: ated by Holtzman 


tions were investlg 


(1952) who found that none of his 12 
judges could guess better than chance 
either the sex or the identity of the 
Es by inspecting the drawings of 40 
male and 40 female Ss. Two male 
Es, one of whom was nearly a foot 
taller and 60 pounds heavier than the 
other, and two female Es differing in 
“degree of feminine qualities,” were 
used. Garfield, Blek, and Melker 
(1952) used two female and two male 
Es to administer the TAT to 54 male 
and 56 female Ss. Neither the sex of 
the Æ nor the interaction of the sex 
of E and S produced significant dif- 
ferences in the stories. 

Riess, Schwartz. and Cottingham 
(1950) investigated the responses of 
negro and white Ss to Negro and 
white stimulus figures on TAT cards 
administered by a Negro and white 
E, and concluded that skin color of 
the E did not affect the length of 
stories. While most investigators 
look for the influence of the E in 
verbalized test responses, Rankin 
and Campbell (1955) used as their 
dependent variable the galvanic skin 
response of male Ss to a Negro and 
white E. In a word association situa- 
tion the E checked and adjusted on 
four occasions dummy equipment 
connected to S’s left wrist. It was 
found that there was a higher differ- 
ential galvanic skin response to the 
Negro E, but since he was 9 years 
older, 23 inches taller and 27 pounds 
heavier than the white E, there was 
no conclusive proof that the differe 
ence was a function of skin color. 

Presence or absence of examiner 
from the testing room. Bernstein 
(1956) found that TAT stories writ- 
ten in the absence of the E more fre- 
quently contained sad themes, sa 
outcomes, and showed greater S 1n- 
volvement than stories © written 
with the Æ present in the testing room: 
Certain aspects of H-T-P drawina? 
secured from applicants for ene? 
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ment at a state school were found to 
bea function of the E’s presence when 
the test was taken. Total size of the 
drawing, house size, house features, 
and person features all differed sig- 
nificantly between the group taking 
the test with E present and the group 
taking the test with the E absent 
(Cassel, Johnson, & Burns, 1958), 
Van Krevelen (1954b) tested 20 Ss 
with 2 cards of the MAPS series; one 
story was dictated to her and the 
other was written in her absence. 
The only significant difference be- 
tween the |two groups was that the 
more ambiguous of the two cards 
produced more words in the written, 

-absent situation, The absence of 
the E seemed to have more effect 
on Szondi test results (Van Krevelen, 
1954a). The Ss were sometimes ad- 
ministered the Szondi by the E and 
at other times took the test them- 
selves. When E was absent, Ss 
showed greater consistency, demon- 
strated more plus-minus reactions, 
and had a greater sum of open and 
plus-minus reactions than when E 
was present. 

Warm-Cold examiner behavior. Luft 
(1953) varied the interaction between 
E and S by acting warm and friendly 
to some Ss and cold and blunt to 
others. The cold interaction con- 
sisted of asking the Ss their social 
security number and draft status and 
by giving a short quiz on current 
events (e.g., “Which horse won the 
Kentucky Derby?”) before adminis- 
tering 10 homemade ink blots. When 
the Ss were asked which inkblots they 
liked and which they disliked, the 
group treated in the warm fashion in- 
dicated that they liked a mean of 7.6 
blots, while the cold Ss liked only a 

mean of 3.1 blots, a difference signifi- 
cant beyond the .001 level. Lord 
(1950) used three styles of adminis- 
tering the Rorschach—neutral, posi- 
tive, and negative—and three female 


Es. In the positive interaction E was 
instructed to look at S with a smile 
and to be warm and charming; the 
negative interaction called for E to 
assume the role of a harsh, demand- 
ing, authoritative figure, deliberately 
unconcerned about S. Each S$ took 
the Rorschach three times with the 
order of E and administration coun- 
terbalanced. As a result of the dif- 
ferent methods of interaction, the 
Protocols elicited from the warm 
administration produced more re- 
sponses, more evidence of intellectual 
and creative imagination, less indica- 
tion of sterotyped thinking and in- 
creased evidence of greater ease in 
interpersonal relations. In the cold 
administration, responses indicating 
imaginative, creative thinking were 
reduced, there appeared to be a with- 
drawal from €motional stimuli and 
there was a rise in self-questioning 
feelings, : 

Operant conditioning of subject's 
verbal behavior. Two studies investi- 
gated the extent to which S's associ- 
ations to ink blots could be deter- 
mined by the F’s behavior. Wickes 
(1956) used 30 homemade inkblots, 
two Es, and 36 undergraduate Ss di- 
vided into two experimental groups 
and one control group. In one ex- 
perimental group the first 15 cards 
were given in the standard manner; 
with Card 16 the Æ said, “fine,” to 
the first movement response, “good,” 
to the second movement response, 
and “all right” to the third move- 
ment response, making these com- 
ments in regular sequence to the end 
of the testing. In the second ex- 
perimental group the first 15 cards 
were given in standard fashion, but 
with Card 16 the E made various 
Postural and gestural changes, nod- 
ding his head three times to the first 
movement response, smiling on the 
next movement response and leaning 
forward in the chair after the third 
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movement response, repeating this 
sequence to the end of the testing. 
When the responses on the first 15 
and second 15 cards were compared 
for all groups, it was found that the 
experimental group given verbal re- 
inforcement made a significant in- 
crease (.025 level) in movement re- 
sponses on the second half of the 
test, the experimental group given 
postural reinforcement made a sig- 
nificant increase (.005 level) in move- 
ment responses in the second half of 
the test, while there was no difference 
in the number of movement re- 
sponses in the control group. Gross 
(1959) verbally reinforced (“good”) 
one group of Ss for each human re- 
sponse on the Rorschach while for 
a second group he nodded his head 
once following each human response. 
Both the verbally reinforced group 
and the nonverbally reinforced group 
produced more human responses 
than a control group. | 
The effect of the examiner as a per- 
son: no assessment of oomme ar 
ity. In a report not seen by the 
soraa quoted by Lord (1950) 
and others, Guilford (1947) con- 
concluded that some Es elicited more 
responses from their Ss in administer- 
ing the Rorschach test than other Es. 
Baughman (1951) also noted that cer- 
tain Es seemed to produce more re- 
sponses in selected Rorschach cate- 
gories than other Es. He selected 633 
protocols secured by 15 Es from the 
files of a veteran’s outpatient clinic, 
and found 12 of 22 scoring categories 
differing significantly at the .001 level, 
with four additional differences sig- 
nificant at the .05 level. Unfortu- 
nately, the protocols were not scored 
by the investigator, so that the differ- 
ences found may have resulted from 
the psychologists’ procedures in scor- 
ing, rather than from their influence 
on S. Both Wickes (1956) and Bern- 
stein (1956) used two Es in their 


studies and each concluded that the 
psychologists did not exert any sig- 
nificant influence on the Ss’ responses. 

To control for some of the sources 
of variation in responding to the Ror- 
schach, Gibby (1952) had 9 Es use a 
standardized inquiry in testing 135 
Ss. Despite this, significant differ- 
ences in responses were found for 6 
of the 11 determinants investigated. 
“The stimulus value of the examiner 
therefore must be considered a factor 
which influences the inquiry re- 
sponses of the subject and therefore 
the final Rorschach psychogram. 
Standardization of the inquiry does 
not eliminate examiner differences” 
(p. 452). An attempt was made by 
Gibby, Miller, and Walker (1953) to 
secure a homogeneous group of pa- 
tients whose Rorschach records could 
be analyzed for E influence. All Ss 
used were male veterans, white, 25- 
32 years old, had functional rather 
than organic ailments, and were the 
most recent patients tested by E. 
All 12 Es whose records were ana- 
lyzed had a minimum of two years 
experience, used Beck’s Rorschach 
method and had tested at least 20 
Ss who met the criteria for the study. 
All protocols were coded and scored 
blindly. Of the nine absolute scores 
which were investigated, three were 
significant at the .05 level or better, 
and of the 8 percentage scores, 3 
were significant at the .05 level or 
better. The investigators concluded 
that “there are significant over-all 
differences in the determinants ob- 
tained by various examiners from 
comparable groups of subjects. .- 
It is also probable that certain ex- 
aminers tend to obtain successively 
dysphoric records while others with 
comparable patients will seldom elicit 
such reactions” (p. 426). 

Robinson and Cohen (1954) exam- 
ined the case reports prepared by 
three psychological interns. The last 
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30 reports for each of the three in- 
terns were examined for the variables 
of dependence, independence, aggres- 
sion, and abasement. When the inci- 
dence of these variables in the case 
reports was compared for each of 
the three psychologists, 6 of the 12 
comparisons were significant at the 
.05 level. When components of 
these variables were considered, 12 
of the 24 comparisons were signifi- 
cant at the .05 level of confidence. 
The investigators concluded that 
this study raises ‘‘a serious question 
about the objectivity of methods of 
evaluation and prediction if they 
must rely solely upon psychological 
reports for their basis” (p. 335). 
The effect of the examiner as a per- 
son; examiner's personality assessed. 
Hammer and Piotrowski (1953) 
asked three staff psychologists and 
three interns to rate 400 H-T-P draw- 
ings on a 3-point scale of aggression. 
The clinicians were themselves rated 
by one of the investigators on the de- 
gree of aggression and hostility they 
manifested in dealing with patients 
and staff members. In addition, the 
clinicians also took the Szondi test 
which was scored by Susan Deri for 
degree of hostile and aggressive im- 
pulses. A rank order correlation of 
-94 was found for the degree of hos- 
tility the clinicians saw in the H-T-P 
productions and the evaluation made 
of their interpersonal hostility. The 
rating of the clinicians’ Szondi tests 
also yielded a rank order correlation 
of .94 with their evaluations of the 
hostility in the H-T-P, The authors 
concluded that ‘‘just as a subject's 
performance on a projective tech- 
nique is a function of his personality, 
his needs, conflicts, desires and past 
experiences, so too, although to a les- 
ser degree, is the interpretation of a 
projective protocol influenced by the 
personality pattern of the interpre- 
ter” (p. 214-215). Filer (1952) ana- 


lyzed 156 case reports submitted by 
13 male clinicians for references to 
hostility, hostility turned inward, 
passive dependency, and feelings of 
inferiority. Evaluations were also 
obtained of the clinician’s behavior 
in terms of ascendency, depression, 
intropunitiveness, extrapunitiveness, 
and impunitiveness. The results indi- 
cated a complex relationship between 
the judgments of the clinician and 
the judgments made by the clinician, 
For example, Es who stressed hos- 
tility turned inward were rated de- 
pressed and intrapunitive. A sepa- 
rate study of references to defense 
mechanisms indicated that the three 
most frequently mentioned mech- 
anisms were more characteristic of 
E than the test S. 

Berger (1954) used the personal 
Rorschach results of eight VA train- 
ees to compare with the Rorschach 
records these Es elicited from VA pa- 
tients. Contrary to the findings of 
most other studies, Berger found no 
significant differences in E influence 
on 12 variables in the patients’ Ror- 
schachs. However, when both the £ 
and his Ss were rank-ordered on the 
12 variables, a rho correlation of .86 
was found for the number of popular 
responses, a rho of .80 for white space 
responses and a rho of —.54 for Y 
responses. 

Lord (1950) did not attempt to 
make a formal assessment of the per- 
sonalities of her three female Es, but 
she obtained descriptions of them at 
a subjective, intuitive level from two 
clinicians, relating these descriptions 
to the Rorschach responses secured 
from Ss. The first Æ, described by 
the judges as a cold, inflexible, mas- 
culine, castrating woman, produced 
Rorschachs from her Ss that made it 
appear that they had faced a threat- 
ening, frustrating situation. The 
second £, described as the most femi- 
nine of the group, the softest and 
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most mother-like, elicited Rorschach 
records that suggested her Ss had 
been under controlled excitement 
without great anxiety or tension and 
had not been greatly stimulated in- 
tellectually. The third E was the old- 
est of the three and was described as 
the most flexible, exuberant, sympa- 
thetic, intermediate 1n feminine qual- 
ities. Her Ss’ Rorschach records sug- 
gested that they had been challenged 
and made anxious, but there was also 
evidence of easy rapport and a rela- 
tive absence of controlling devices. 
The TAT and MMPI records of 
nine Es were evaluated and com- 
pared with the Rorschach responses 
the Es had elicited. An analysis of 
E variance showed 22 of 37 selected 
Rorschach variables significant at 
the .05 level or better (Miller, Sand- 
ers, & Cleveland, 1950). Sanders 
and Cleveland (1953) trained nine 
second year graduate students to ad- 
minister the Rorschach test after first 
obtaining 4 personal Rorschach from 
each. After a period of training, each 
E administered 20 Rorschachs to un- 
dergraduate Ss; at the end of each 
testing session S was asked to rate 
E on measures of overt anxiety and 
hostility. Indications of Es’ covert 
anxiety and hostility were obtained 
by rating their personal Rorschach 
records. When the number of re- 
sponses given by each S was held 
constant it was found that of the 
20 variables investigated, 9 differed 
significantly among Es. The Es raia 
high on overt anxiety by oa S 
elicited more responses, more white 
and more color re- 
space responses an rey sant 
sponses than those rated ow poeet 
anxiety. NO diferen ee m wile e 
could be accounted ek meg 9 
the Es on covert oe oF thelr Ss 
rated high on hostility ‘ 


elicited greater + 
Rorschachs they 4 


i man con- 
secured less hostile and hut 


tent than those Es whose Ss rated 
them low in hostility. The Ss’ reac- 
tion times seemed slower when E was 
perceived as hostile, a finding in 
agreement with the observation made 
of the test behavior of Ss with hyp- 
notically induced hostility to £ 
(Counts & Mensh, 1950; Pattie, 
1954). When Es’ personal Ror- 
schachs were examined for hostility, 
it was found that the more hostile Es 
elicited less Y%, less 4%, more hos- 
tile content and more human content 
than those Es with low covert hos- 
tility. The authors concluded that 
“The results of this study provide 
evidence that a subject’s responses 
on the Rorschach test are not solely a 
product of his own emotional prob- 
lems and personality structure” (p. 
48). While the investigators clearly 
indicated that E as a person can in- 
fluence the test results, they esti- 
mated that the extent of this influ- 
ence can account for only 3-7% of 
the total variance in the Rorschach 
scores. 


THE INFLUENCE OF THE SUBJECT 


While there is wide theoretical 
agreement that each party in the 
testing situation exerts an influence 
on the other, the experimental evi- 
dence is limited almost entirely to the 
effect E has on S. Perhaps the chief 
reason for the failure to investigate 
the manner in which S can influence 
the psychologist is the lack of tradi- 
tional experimental procedures that 
can afford control of the S's behavior 
while the E’s behavior is allowed to 
vary.’ The one study in this area 
(Masling, 1957) controlled S's be- 
havior by using attractive female ac- 
complices who posed as test Ss, act- 
ing warm or cold to £E. The de- 
pendent variable was the interpreta- 
tion placed on sentence completion 
protocols by eight graduate student 
Es. It was found that when S acted 
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warm to E her protocol was inter- 
preted more favorably (i.e., she was 
seen in better mental health) than 
when she acted cold. In addition, 
the results indicated that when E 
saw two Ss, one of whom was cold 
and the other warm, the protocol of 
the warm S was interpreted more 
favorably than that of the cold S. 


Discussion 


The studies presented in this Paper 
have been reviewed rather uncrit- 
ically, with the emphasis on content 
rather than on adequacy of experi- 
mental design. Since faulty experi- 
mental procedures appear with regu- 
larity in the studies in this area, it 
might be worthwhile to examine 
more closely the more commonly 
found limitations in design: 

1. No study reviewed here exten- 
sively sampled the E population. As 
Hammond (1954) has indicated, rep- 
resentative design demands that both 
E and S populations be adequately 
sampled if generalizations are to be 
made to larger groups of S and E. 
Most studies cited here, however, 
utilized only one E, with only Baugh- 
man (1951) using as many as 15 Es, 
The general results of the work on E 
differences makes clear how tenuous 
it is to assume that one E is drawn 
from the same population as any 
other E, 

2. Not only has the E population 
been inadequately sampled, but the 
little attention given to Es has been 
directed for the most part to grad- 
uate students. While it is legitimate 
to work with a graduate student 
population it is inappropriate to gen- 
eralize findings to a Population of 
older, more experienced Es. 

3. Those studies which investi- 
gated the influence of E differences 
by utilizing a random sample of cases 
found in the files of a clinic, make the 
assumption as Levy (1956) has indi- 


cated, that the cases were originally 
assigned on a random basis. This as- 
sumption may not always be valid, 
due to differences in E schedules, in- 
terests, and competence. Asa result, 
differences in test records may be in 
part a function of uncontrolled bias 
in the selection of Ss. Itis far better 
procedure for the investigator to con- 
trol the assignment of cases than to 
assume existing cases had been ran- 
domly assigned. 

4. A frequent method of assessing 
E personality has been to ask the E 
to take a psychological test. As the 
results of this review make quite 
clear, the orientation S has toward 
the test considerably influences his 
responses. Few graduate students in 
Psychology are naive regarding the 
more common projective tests, even 
if they have never seen them before, 
The meaning of a Rorschach test 
taken by a graduate student, there- 
fore, is unclear and cannot be easily 
related to differences in Ss’ responses, 
A better way of evaluating E’s per- 
sonality might be to obtain judg- 
ments by his supervisors and col- 
leagues. Another method that shows 
promise is that used by Sanders and 
Cleveland (1953), who asked the test 
Ss to make ratings of their impres- 
sions of £, 

5. Most investigations of suscepti- 
bility of Ss’ responses to situational 
influence have been conducted em- 
Pirically, with no prior attempt made 
to predict where differences would be 
found. Research on the Rorschach 
has been particularly culpable in this 
regard. Since the Rorschach is still 
Primarily an empirically, rather than 
theoretically, based instrument, most 
investigators have attempted to de- 
termine only if differences would oc- 
cur between experimental and con- 
trol groups, but, on finding differ- 
ences, have been unable to interpret 
their meaning. As a result of this ap- 
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proach, almost every Rorschach score 
has at one time or another been found 
to be a function of some experimen- 
tal variable: Z, W, W%, Ds, D, Dd%, 
F—, F+%, F% FM, m, M, CF, G, 
Y, Y%, A, A%, P, R, reaction time 
and experience balance have all been 
reported to change as a result of ex- 
perimental conditions. : 

6. Many of the Rorschach studies 
appearing before Cronbach's (1949) 
critique of Rorschach research did 
not control for the number of re- 
sponses, but assumed that all differ- 
ences in determinants could be at- 
tributed to the experimental vari- 
able. To a large extent investigators 
now attempt to partial out differ- 
ences in the number of responses, 
but an occasional study will ene 
regard this factor. wee mone ithe 
other statistical errors r ‘ai 

ccur far less frequently in 
RA ESDAN inflation of proba- 
bility levels continues to be a major 
error. j 

eae these flaws in design the 
studies cited here presented strong 

idence of situational „and inter- 
e al influences in projective test- 
a It is important to note, how- 
ever, that the projective response 
did not change with any and all con- 
ditions imposed by the E. The ues 
of drugs produced only minima 
changes in the protocol and there was 
conflicting evidence regarding the im- 
portance of such physical differences 
among Es as skin color, sex ead ne 
What appeared to be ei ce ele 

xtent to W k 

E e athe total testing situ- 
attitude towa d by the experi- 
ation was influence a eee 
mental conditions. Wher arte 
x A s periphera 
imental variable wa 
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unique and sufficiently contiguous to 
the testing session, as in the use of a 
waiting room decorated with pictures 
of nude women (Rabin, Nelson, & 
Clark, 1954), Ss evidently construed 
the experimental conditions to be a 
part of the total testing situation. 

There is considerable evidence that 
Ss in an unstructured situation will 
utilize all available cues to complete 
their assigned task. The S in the 
projective test setting will not only 
use those cues furnished by the ink 
blot or picture, but also those sup- 
plied by his feelings about the ex- 
aminer, those furnished by his needs, 
attitudes.and fears, those implied in 
the instructions, the room, and previ- 
ous knowledge of the test, and those 
cues supplied consciously or uncon- 
sciously by E. When £E faces the am- 
biguous situation of supplying mean- 
ing to a series of isolated, discrete re- 
sponses, he will not only rely on S's 
responses, but also on those cues fur- 
nished by his training and theoretical 
orientation, his own needs and ex- 
pectations, his feelings about S and 
the constructions he places on S's 
test behavior and attitudes. In short, 
these studies demonstrate that E 
and S behave as we should expect, 
considering our knowledge of behav- 
ior in ambiguous settings. 

Thus, the procedure that many 
clinicians hoped would serve as an 
X ray proves, on close examination, 
to function also as a mirror, reflecting 
impartially S, E, the situation and 
their interactions. This need not be 
a cause for despair, except for those 
who feel that E and situational influ- 
ences contaminate a protocol. These 
influences are not sources of error, 
however, but indications of adapta- 
tion to the task. One reason for the 
poor record of blind analysis as a pro- 
cedure for validating projective de- 
vices is that this method can utilize 
only a fraction of the material avail- 
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able in a protocol. Instead of trying 
to eliminate interpersonal and situa- 
tional influence E might better make 
a more thorough search of his own 
attitudes and of S’s attitudes toward 
the test and the situation (Leventhal, 
Rosenblatt, Gluck, & Slepian, 1958). 
The interpersonal situation ‘‘is not an 
evil. It should not be striven against. 
As in psychoanalytic technique, this 
relationship must be regarded as in- 
evitable, as a potentially significant 
influence on the patient’s produc- 
tions, and as a possible goldmine of 
material for interpretation” (Scha- 
fer, 1954, p. 6). 

The important problems in this 
area remain unsolved. What effect 
does experience have on E’s sensi- 
tivity to the attitudes of S? Of 
what importance is the psychological 
health of S in his response to the at- 
titudes of E? Little is under- 
stood about the circumstances which 


prompt an S to rely heavily on inter- 
personal cues, nor do we know much 
of the forces acting on an E who is 
faced with a belligerent, or overly 
cooperative or suspicious S. Most 
important of all, the great bulk of 
the studies cited here indicate only 
that situational and interpersonal 
variables influence the projective 
protocol, without in any way describ- 
ing how these variables impinge on 
Eand S. How S senses that E is hos- 
tile and how E realizes S is trying to 
control him can only be determined 
by studying the interaction process 
itself. It is interesting to note that 
no study reviewed here used verba- 
tim recordings of the testing session, 
despite the growing popularity of 
verbatim transcripts in research on 
psychotherapy. Hopefully, future re- 
search on projective testing will in- 
vestigate more fully the step-by-step 
transactions between E and S, 
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NOTE ON THE MULTITRAIT-MULTIMETHOD MATRIX 


LLOYD G. HUMPHREYS 
University of Illinois 


The recent article by Campbell 
and Fiske (1959) is stimulating and 
sound, but one of their conclusions 
can be supplemented to indicate a 
desirable direction for further re- 
search. They conclude as follows: 
“Measures of the same trait should 
correlate higher with each other than 
they do with measures of different 
traits involving separate methods. 
Ideally, these validity values should 
also be higher than the correlations 


different methods is implicit in Spear- 
man. It has also been suggested by 
several measurement theorists subse- 
quent to Spearman, but has rarely 
been followed explicitly as a basis 
for test construction. Yet, consider- 
ing the present status of personality 
measurement, one might well con- 
clude that this procedure constitutes 
our only hope in this area. 

Consider the following four rows of 
a hypothetical rotated factor matrix: 


Method Method Method Method Method Method 


Trait Trait 
A B I 
Test 1 40 00 80 
Test 2 40 00 00 
Test 7 00 40 80 
Test 12 00 40 00 


II ll IV v VI 
00 00 00 00 00 
80 00 00 00 00 
00 00 00 00 00 
00 00 00 00 00 


among different traits measured by 
the same method.” There is no argu- 
ing with their first criterion, and 
their second is hedged by the use of 
“ideally.” Even so, the second is too 
strong. It is the purpose of this note 
to show that the second criterion is 
better characterized as “nice.” The 
degree of importance to be attached 
to it is simply a function of the num- 
ber of different methods that can be 
used to measure the trait. 

The procedure required to handle 
the problem posed by correlations 
between measures of different traits, 
utilizing the same method, that are 
higher than the correlations between 
measures of the same trait across 
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The correlation between Variables 
1 and 2, both assumed to measure 
the same trait, is only .16, but the 
correlation between 1 and 7, measur- 
ing different traits, is .64. Clearly, 
the ideal of Campbell and Fiske 
(1959) is not met, and by a very sub- 
stantial amount. But now let us as- 
sume that Variables 3 through 6 (not 
shown) also have loadings of .40 on 
Trait A, and each has a loading of .80 
on one of the succeeding methods 
factors (Factors III through VI). 
Similarly, Variables 8 through 11 
have loadings of .40 on Trait B, with 
method loadings of .80 on succeeding 
factors. If we now sum Variables 1 
through 6 and Variables 7 through 


: 


a 


: Campbell an 
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12, the two aggregates will have the 
following factor loadings, as deter- 
mined by the formula for correla- 


tions of sums: 


of increasing the number of methods 
which measure the trait in question, 
the writer does not intend to convey 
the impression that the problem is 


Trait Trait Method Method Method Method Method Method 
A B 1 II Ill Iv v VI 
pee ene 
Sum of 
1-6 73 00 24 24 24 24 24 24 
Sum of 
7-12 00 73 24 24 24 24 24 24 


The factor loadings for method of 
80 have now shrunk ta 2i and the 
loadings of .40 for ans. have ia 
creased to .73. The corre’ es e- 
tween the two measures of the or- 

| traits is .35, because they 
ance thods variance in common, 
Han aa correlation with an equally 
fallible measure of the same trait Is 
ne 53. Even so, we can conclude 
hat our two aggregates Tor pay- 
chometric characteristics whic i e 
them potentially useful tools or 
further research involving these two 

it Both could be further im- 
i by the addition of methods, 
ae use of nonoverlapping meth- 
T T by the use of suppressors for 


methods. 


Note that the first criterion of 


d Fiske must hold if tis 
A ful. Beyon 
5 eis to be success Š 
marpa entirely on our ie 
cog davis’ enough methods o 
ity 


trait in which 
nt for each trait 1 
measureme b 


we are intereste i y particular 
thods require v low the 
eaaa will depend a trait is 
intercorrelations are 'hods vary, as 
held constant and, ape intercorre- 
opposed to how high pa held con- 
lations are when metho 
stant and traits vary: the problem 
In discussing so glibly 


an easy one empirically. Quite the 
reverse is true, but the difficulty of 
the search should not deter us. After 
all, we can hardly point with pride 
to what we have accomplished in the 
personality realm in the attempt to 
find good trait measures defined by 
single methods. 

The reader should not conclude 
that aggregating across methods to 
obtain better trait measures is re- 
stricted to the personality realm. 
Systematic application of this pro- 
cedure to aptitude measurement is 
also desirable. Guilford’s analysis of 
the ability domain (1956) could serve 
as the basis for deciding which di- 
mension or dimensions should be 
added across in constructing socially 
or scientifically useful trait measures. 
This is in contrast to a point of view, 
which is apparently held by Guil- 
ford, that recommends a separate 
test for every cell in his table. Thus, 
in constructing a reasoning test, it 
might be desirable to aggregate 
across methods, e.g., series, analogies, 
and classification items; and across 
content, e.g., words, numbers, fig- 
ures, letters, and objects. The fact 
that an ability can be fractionated is 
insufficient basis for concluding that 
it should be fractionated. 
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INCENTIVE MAGNITUDE, LEARNING, AND 
“PERFORMANCE IN ANIMALS? 


BENJAMIN H. PUBOLS, JR? 
University of Miami 


Interest in the effects of variations 
in incentive magnitude on learning 
and performance 1n animals has in- 
creased markedly during the last 
decade, and there are now more than 
75 experimental papers dealing with 
various aspects of this topic. These 
papers have been concerned, for the 
most part, with the following prob- 


lems: 

1, What is the effect of incentive magnitude 
on rate of learning and asymptotic perform- 
ee variation in incentive magnitude 
affect learning, performance, or both? : 

3, What is the effect of incentive magnitude 

resistance to experimental extinction? 
on Does incentive magnitude interact with 
E other parameters of reinforcement, and 
if so, in what way? A _ 

5. What is the nature o the mechanism, 
or dimension of reinforcement, whose quanti- 

ve variation produces behavioral changes? 
a a e there any differences in performance 
i a. oe of whether a given S experiences only 
pra ntive magnitude in a given situation, 
igo than one allowing possible compari- 
or 


sons? 4 Pae 
P as incentive, reward, and rein- 
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forcemen fer to stimuli which strengthen 
paper, to st follow, Without implication of 
s £ y^ any particular theoretical 
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7. What is the effect of incentive magnitude 
on acquired reward value? 


This paper will follow an outline 
suggested by these questions, and in 
each case an attempt will be made to 
draw conclusions as definite as the 
available evidence allows. 

A brief review of some general 
methodological considerations would 
seem to be in order to help provide a 
framework for the discussions which 
will follow. A variéty.of types of ex: 
perimental operation have been used & 
by different- investigators in the 
manipulation of incentive magnitude. 
These operations have tended to fall: 
into one of five classes, usually in- 
volving the manipulation of food in’ 
some form, but sometimes water, r` 
a receptive female of the same spe- 
cies. = 

Thése classes of operations are: 
(a) variation in weight, volume, or 
size of a single incentive unit; (b) 
variation in number of equal-weight 
incentive units; (c) variation in dura- 
tion of exposure to the incentive; (d) 
variation in concentration of sugar, 
usually sucrose, solutions; and (e) 
variation in incentives which do not 
alter the effects of deprivation, such 
as concentration of saccharine, or ex- 
tent of incomplete sexual behavior. 

It should be apparent that the 
actual dimensions, or mechanisms, 
of quantitative variation are several, 
and that they may differ among 
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these classes of operations. Guttman 
(1953) has pointed out that the ex- 
pression, “quantity of reinforcing 
agent,” may refer to several dimen- 
sions of variation. He lists these as 
follows: (a) amount of nutrient ma- 
terial available for assimilation, in 
terms of weight or volume; (b) stimu- 
lation (primarily visual, but may also 
be olfactory or tactile) derived from 
the incentive prior to consummatory 
behavior; (c) amount and nature of 
consummatory activity involved; 
and (d) stimulation from the incen- 
tive during consummation (e.g., taste 
characteristics). 

Kling (1956) has further indicated 
that the third factor, consummatory 
activity, may be broken down into 
several subdimensions, which are: 
number of consummatory responses, 
duration of consummatory activity, 
ratio of consummatory to noncon- 
summatory responses in the goal 
area, and rate of consummatory re- 
sponding. 

Different combinations of varia- 
tion result from different operations. 
Thus, the two classical operations, 
manipulation of weight and number, 
each involve simultaneous variation 
along all four dimensions. Manipula- 
tion of duration of incentive exposure 
typically involves changes in amount 
of nutrient material and consumma- 
tory activity. The manipulation of 
sugar concentration involves con- 
comitant variation in amount of 
nutrient material and consummatory 
stimulation. And the fifth operation 
may lead to concurrent variation in 

amount of consummatory activity 
and stimulation during that activity. 
The consideration of differences in ex- 
perimental manipulations will be 
central to the discussion of mecha- 
nisms of reinforcement. 

A second category of methodologi- 
cal differences is related to another 
question raised above. This concerns 
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the number of incentive magnitudes 
a given S experiences. Lawson (1957) 
calls the method whereby each S ex- 
periences only one value the ‘“‘abso- 
lute”? method, and the method 
whereby each S experiences more 
than one value the “differential” 
method. For the most part, studies 
utilizing the differential method will 
be considered in a separate section. 

To answer the question of whether 
variations in incentive magnitude 
affect learning or performance, a 
special two-phase experimental de- 
sign will be required. This will be 
outlined in the appropriate section. 
However, another device for dis- 
tinguishing between learning and 
performance effects, which will be 
adopted in this paper, requires com- 
ment now. This is the use of meas- 
ures involving time (e.g., latency, 
running time, speed of response, rate 
of responding) as measures of per- 
formance, and the use of time-inde- 
pendent measures (e.g., errors, trials 
to criterion) as measures of learning. 
Although there are perhaps no a 
priori grounds for making this as- 
signment, the review of the litera- 
ture which follows should bear out 
its worth. 

It will be noted that certain topics 
are being omitted which might seem 
relevant to the issues discussed in 
this paper. For example, theoretical 
considerations are minimized, This 
is not to say that the writer feels 
them to be unimportant or unsuccess- 
ful. Rather, the aim of the present 
paper is to attempt to order the em- 
pirical evidence regarding incentive 
magnitude, so that the theorist will 
have a clearer picture of the data 
with which he will work. That is, 
the paper will attempt an empirical, 
rather than a theoretical, integration, 
Theoretical discussions will be found 
in most of the papers to be reviewed, 
especially the following: Crespi 
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(1944); Hull (1943, pp. 124-134; 
1952, pp. 140-148); Meyer (1951); 
Pereboom (1957b); Reynolds (1949); 
and Spence (1956, pp. 127-148). 
Also’ omitted are latent learning 
studies of the Blodgett type (e-g-, 
Blodgett, 1929; Tolman & Honzig, 
1930). They are especially pertinent 
to the present paper, as it is possible 
. to interpret them as studies involv- 
ing a change in incentive magnitude, 
from a minimal amount to a larger 
When so inter- 
preted, their results agree quite well 
with the results of studies to be re- 
viewed. However, the latent learn- 
ing experiments have been ade- 
: quately reviewed elsewhere a number 
of times (e.g., Thistlethwaite, 1951). 
Finally, studies in which reinforce- 
ment is administered other than 
peripherally (e.g., intravenously or 
by stomach fistula) are excluded. In 
other words, this review will be re- 
stricted to studies in which incentives 
are administered peripherally, and 
involving exteroceptors. 


RATE oF LEARNING AND ASYMP- 
TOTIC PERFORMANCE 


The evidence to be reported in this 
section will be based on situations in 
which animals are given a series of 
rewarded training trials under one of 
several quantities of reinforcement, 

. and measurements are made of ter- 
minal performance Èvel and rate of 
approach to this level. Unfortunately, 
“there is not sufficient evidence in all 

_cases that these terminal levels ac- 
tually sepresent asymptotic per- 


`` formange. Nevertheless, some meas- 


ure of performance over the final few 

training trials is usually given. 
Many investigators have taken 

differences in rate of approach to as- 


*- ymptotic performance level as a func- 


tion of incentive magnitude to reflect 
an effect on rate of learning, and dif- 
ferent asymptotes to reflect a differ- 
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ence in performance. While the 
former interpretation may be legiti- 
mate, the latter definitely is not. A 
higher terminal level could be an 
indication of a greater amount 
learned, or that amount of learning is 
the same but performance is superior. 

Relevant manipulations have in- 
cluded variations in incentive 
amount by all five of the operations 
outlined in the introduction, These 
will be considered in turn, and, unless 
otherwise stated, the absolute 
method of incentive presentation was 
used. 


Variation in Weight and Number of 
Incentive Units 


Apparently the first study of 
quantitative variation was Grind- 
ley’s, reported in 1929. He trained 
five groups of chicks to run down a 
runway to either 0, 1, 2, 4, or 6 grains — 
of boiled rice. When reciprocal run- 
ning times on the last five of seven 
trials were plotted against the num- 
ber of grains of rice, an increasing, 
negatively accelerated curve was ob- 
tained. It is uncertain just what this 
curve represents, however. That it 
represents asymptotic performance 
with so few trials is questionable. 

_ Wolfe and Kaplon (1941) also 
used chicks as Ss. Three groups were 
run successively on each of three 
problems, a runway, detour problem, 
and T maze. Running times through- 
out the 25 or 35 trials given on each 
problem fell in the following order, 
shortest to longest, for the three in- 
centive amounts used: four quarter- 
grains of popcorn, one full grain, and 
one quarter-grain. Critical ratios on 
the final training days (consisting of 
five trials) indicated no significant 
differences between groups, but sev; 
eral approaching significance in com- 
parisons involving the one quarter- 
grain group. Inspection of the Wolfe 
and Kaplon curves indicates a similar 


———— 
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rate of approach to the asymptotes 
among all groups on all problems. 
Crespi (1942) gave runway train- 
ing to various groups of rats, with 
the number of incentive units given 
as reward varied in logarithmic steps. 
Each incentive unit weighed -02-gm. 
and the numbers of units used were 
1, 4, 16, 64, and 256. Asymptotic 
running speeds (Trials 21-25 in one 
experiment, Trials 16-20 in another) 
Were an approximately logarithmic 
function of the number of incentive 
units. On the other hand, rate of ap- 
Proach to these asymptotes was ap- 
proximately constant for the various 
incentive values. These findings, 
that the amount of incentive affects 
performance at asymptote but not 
rate of approach to the asymptote, 
have with few exceptions been con- 
firmed in later studies using the ab- 
solute method of incentive presenta- 
tion. 
Continuing with the runway 
studies, Zeaman (1949) varied the 
weight of single incentive units as 
follows: .05-, .20-, -40-, .80-, 1.60-, 
and 2.40-gm. Eighteen or 19 daily 
trials were given, and equations were 
fitted describing the decrease in log 
latency over successive trials, treat- 
ing Trials 14-19 as asymptotic. 
Equation constants representing as- 
ymptotic performance differed sig- 
nificantly from each other, while 
those representing rate of approach, 
or the slope constants, were nearly 
identical. Conclusions are again 
clear: Quantitative variation affects 
terminal level of performance but 
not rate of approach to this level. A 
plot of asymptotic log latency as a 
function of log grams of food yields a 
decreasing function with slight posi- 
tive acceleration over the range in- 
vestigated. 
Additional runway studies have 
corroborated these findings. Among 


these are the studies of Lawrence and 
Miller (1947), Metzger, Cotton, and 
Lewis (1957), and Spence (1956, pp. 
130-132). 

Two other runway studies require 
brief comment. Pereboom and Craw- 
ford (1958) measured both forward 
running time and competing response 
time over 40 trials, and found that 
incentive magnitude affects both var- 
iables, the latter more so. These re- 
sults suggest that a good deal of the 
“learning” that is shown by a de- 
crease in running time (or changes in 
related time measures) may actually 
reflect the elimination of competing 
response tendencies, such as explora- 
tion, and the like, rather than a 
marked decrease in forward running 
time. 

Gagné (1941) found an increase in 
rate of acquisition and a decrease in 
terminal log latency for eight train- 
ing trials, as a function of incentive 
amount. However, for purposes of 
this study, incentive amount was 
completely confounded with inter- 
trial interval such that the longer 
rest intervals were associated with 
larger amounts. The faster learning 
rate for larger amounts may then be 
due to the distribution of practice 
effect, and not to the amount of in- 
centive received. 

A study by Hutt (1954) utilizing 
the bar-pressing response confirms 
the runway finding that asymptotic 
performance is positively related to 
incentive magnitude. Hutt varied 
both quantity and quality of incen- 
tive factorially and assessed their ef- 
fects on rate of responding under PR. 
The three quantities were 3-, 12-, and 
50-mgm., manipulated by varying 
the size of the food dipper, and the 
three qualities were a basic diet plus 
saccharine (most preferred), basic 
diet alone, and basic diet plus citric 
acid (least preferred). The animals 
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had first received training under con- 
tinuous reward, and cumulative re- 
sponse curves under PR are essen- 
tially linear. Thus it can be con- 
cluded that the obtained significant 
effects of both quantity and quality 
represent differences in asymptotic 
performance. The differences were 
in the expected direction in both 
cases. 

Several studies have employed 
more complex tasks, such as visual 
discriminations and T-maze prob- 
lems. Reynolds (1949) trained two 
groups of rats on a black-white dis- 
crimination with a single incentive 
unit weighing either 30 or 160 mgm. 
and found that, although mean re- 
sponse times for the two groups dif- 
fered significantly, differences in 
trials to criterion were negligible, the 
means being within one trial of each 
other. Other investigators (Hopkins, 
1955; Schrier, 1956a) have obtained 
discrimination learning results in sub- 
stantial agreement with Reynolds. 

In another study, Reynolds 
(1950a) compared acquisition of a 
simple T-maze habit over a constant 
number of trials with the incentive 
unit weighing either 30 or 160 mgm. 
He found a greater percentage of cor- 
rect responses and faster running 
times for the larger amount group. 
Inspection of his curves indicates 
similar rates of approach but dif- 
ferent terminal levels of performance. 
Coyer (1953) employed a multiple- 
unit, multiple-choice linear maze and 
four levels of amount of incentive. 
Amount failed to have a differential 
effect on either his learning measure 
(errors) or his performance measure 
(running time). Finally, Heyer 
(1951) employed a five-unit maze 
and groups of rats under either high 
or low thirst drive. The high drive 
Ss drank significantly more water in 
the goal box than did the low drive 


animals and thus received larger re- 
ward. However, the two groups 
failed to differ in terms of trials or 
errors to criterion, or times per trial. 


Variation in Duration of Incentive Ex- 
posure 


Kling (1956) employed a runway, 
the thirst drive, and water as incen- 
tive. Amount of water incentive was 
manipulated factorially with two 
levels each of duration of exposure to 
the drinking tube (15- and 120-eec.) 
and drinking tube diameter (2 and 
5 mm.). Several consummatory re- 
sponse measures were obtained and 
running speeds over the last three of 
13 daily trials were found to be un- 
systematically related to the volume 
of water consumed per trial, time 
per trial actually spent drinking, and 
proportion of goal box time spent 
drinking. They were positively re- 
lated, however, to ingestion rate, a 
measure of consummatory activity. 
In a similarly designed study, Hellyer 
(1953) independently varied amount 
of water reinforcement and drinking 
tube size, and found that both vari- 
ables affected runway latencies. 
There was an inverse correlation be- 
tween duration of consummatory re- 
sponse and latency. 

Fehrer (1956) carried out two ex- 
periments, one with a U maze, the 
other with a runway, using the thirst 
drive and varying both time in the 
goal box and amount of incentive. 
Running speeds were found to vary 
systematically with neither amount 
of incentive (40-sec. drinking vs. 10- 
sec. drinking followed by an addi- 
tional 30 sec. in the now empty goal 
box) nor time in the goal box (10-sec. 
drinking vs. 10-sec. drinking fol- 
lowed by 30-sec. delay in the goal 
box). 

Spence (1956) has pointed out that 
in the majority of the studies of the 
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type in which weight or number of 
incentive units are manipulated, du- 
ration of time spent in the goal box 
has been confounded with magni- 
tude, Ss typically being allowed to 
remain in the goal box until, and 
only until, the incentive has been 
consumed (the Fehrer study, just 
described, is an exception to this). 
Two of his students, Swisher and 
Czeh, varied magnitude and dura- 
tion independently by allowing rats 
with larger amounts time in the goal 
box equivalent to that of animals 
with smaller amounts, but then al- 
lowing them to finish eating else- 
where. These two studies, one meas- 
uring bar-pressing latencies, the 
other, runway starting speeds (re- 
ported on pp. 138-141) both pro- 
duced evidence that performance 
varies as a function of duration in the 
goal box and not amount con- 
sumed, Thus, they agree with 
Kling and Fehrer in the finding that 
Performance does not vary with the 
actual amount consumed, but dis- 
agree with Fehrer’s finding that per- 
formance did not vary with dura- 
tion either, 


Variation in Concentration of Sugar 
and Saccharine Solutions 


Studies in which variation in in- 
centive magnitude is achieved by 
manipulating the concentration of 


some apparent exceptions to the gen- 
eralizations that rate of learning is 
independent of quantity of reinforcing 


centration of sugar solutions was 
Guttman (1953). Different groups 
of rats were presented 4%, 8%, 16%, 
or 32% sucrose solutions as rein- 
forcement for bar pressing, and the 
effect was measured on response rate 


during conditioning, extinction, and 
PR, and on time for original condi- 
tioning. The measures obtained dur- 
ing original conditioning are of im- 
mediate concern. Asymptotic re- 
sponse rate, and rate of approach to 
this asymptote were found to vary 
with concentration. The relation 
between rate of approach and con- 
centration was monotonically in- 
creasing, but the asymptotic level 
increased up to 16% and then fell 
off at 32%. In addition it was found 
that the time required to emit 500 
responses during acquisition was a 
decreasing function of concentra- 
tion. The two measures of acquisi- 
tion rate yield results seemingly in- 
consistent with findings previously 
reviewed. The 32% group clearly at- 
tained its terminal response rate with 
fewer reinforcements than did the 
other groups. But the asymptote 
for the 32% group is lower than that 
for either the 8% or the 16% groups. 
Guttman suggests that this lower 
asymptote might be due to uncon- 
trolled drinking behavior in competi- 
tion with bar pressing. If we are to 
reconcile the finding of faster acquisi- 
tion rate with increased quantity of 
incentive in this study, with the re- 
sults which failed to find such a rela- 
tionship, one possibility is to give it 
the status of an artifact. The non- 
monotonic asymptote is, as indicated, 
likely an artifact, and this asymptotic 
function and the different approach 
rates are probably interrelated. If 
the 32% group has less distance to go 
to reach asymptote, it will get there 
sooner, 

In two studies, Young and Shuford 
(1954, 1955) investigated the effect of 
Sucrose concentration on a runnin 
response. In the first study (1954), 
atency and running time Were meas- 
ured where concentrations were 2%; 
6%, 18%, and 54%. By the end of 
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18 daily trials the groups had ordered 
themselves in terms of latencies in 
the order indicated, greatest to least. 
But asymptotic running speeds were 
the same for all groups. Finally, these 
asymptotic speeds were achieved 
sooner the higher thé concentration, 
a finding paralleling Guttman’s. This 
finding is possibly artifactual also, 
because, on the initial trials, Ss with 
higher concentrations were running 
faster (all Ss had had preliminary 
training with the appropriate con- 
centration, and this can account for 
the initial gradient), and hence pre- 
sumably they had less far to go to 
reach the common asymptote. In the 
second study (1955), concentration 
was again varied in logarithmic steps 
from 2% to 54%, and 25 daily trials 
were given. Terminal running speeds 
were nonmonotonically related to 
concentration. 

Hughes (1957) varied both the 
concentration and volume of sac- 
charine solution, and assessed their 
éffect on latency, running time, and 
percentage of correct responses ina 
T-maze. For Trials 2-40, the per- 
centage of correct responses increased 
significantly with both variables. 
Reciprocal latency and reciprocal 
running time increased significantly 
with volume but not concentration. 
Finally, for the percentage correct 
response measure, there. was a sig- 
nificant interaction between volume 
and concentration, such that for the 
smaller of two volumes the effect of 
concentration was monotonic, but 
for the larger volume the effect of 
concentration was nonmonotonic. 

Finally, Smith and Duffy (1957) 
reported faster learning of a T-maze 
habit when the reward was 4 cc. 20% 
sucrose than when only .1 cc., in 
terms of increases in percentage cor- 
rect, and decreases in running time. 
Here we have another possible ex- 


ception to the generalization that in- 
centive magnitude does not affect 
rate of learning. However, their 
measures of rate of learning were 
somewhat unorthodox and no direct 
statistical comparisons in terms of 
volume were made. In the interests 
of parsimony, and until this design is 
repeated, preference is to maintain 
the original conclusion that learning 
rate is independent of incentive mag- 
nitude. 


Zero Incentive Magnitude 


A final set of three papers are re- 
lated in that, in each, one of the mag- 
nitudes employed was a zero mag- 
nitude. Furchtgott and Rubin (1953) 
used a two-unit, linear T maze and 
assigned different groups of rats to a 
single incentive unit weighing either 
0, 20, 75, 250, or 2500 mgm. No dif- 
ferences in terms of three measures 
of rate of learning (trial to criterion, 
number of Ss attaining this criterion, 
number of errors during the precri- 
terial period) were found except that 
all groups receiving any positive 
amount did better than the group re- 
ceiving zero amount. However, mean 
running speeds did differentiate the 
positive amount groups, the two 
larger reward groups running faster 
than the two smaller reward groups. 

Two other studies support the find- 
ing that a positive amount leads to 
faster learning than zero amount. 
Seward, Shea, and Elkind (1958) 
found a significant interaction be- 
tween incentive amount (a 1-gm. or 
a .5-gm. pellet vs. no pellet) and 
trials, indicating a possible effect on 
rate of learning, but there was no 
improvement at all when the goal 
box was empty. Smith and Kinney 
(1956) found faster bar pressing for a 
reward of 20% sucrose solution than 
for 0% (plain water), during a single 
26-min. session. 
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LEARNING VERSUS PERFORMANCE 
EFFECTS 


The tentative conclusions arrived 


present section will be concerned 
with whether this difference in as- 
ymptotic performance reflects a dif- 
ference in amount learned, or in level 
of performance independent of learn- 


requires a two-phase experimental 
design, comprising a training and a 
Training-phase per- 
formance preferably should be as- 


have noted, differentiate learning 
and performance effects, as either 
could be present, and, if so, their 


rest on the assumption 
effects should be relative 


Maher & Wickens, 1954), 

An implication of these statements 
is that, if magnitude affects learning, 
the approach to the new asymptote 
under test-phase magnitude (where 
there has been a change in magnitude 


in going from one phase to the next) 
should be gradual, as it was in the 
training phase. But if magnitude af- 
fects momentary performance, the 
approach to the new asymptote 
might be either gradual or abrupt. 
The most compelling evidence that 
magnitude affects performance inde- 
pendently of learning would come 
from findings of an abrupt change in 
performance level. This is nota nec- 
essary requirement however, for it is 
possible that competing response 
tendencies might prevent sudden per- 
formance shifts (see Pereboom; 
1957a, 1957b). In contrast, if incen- 
tive magnitude affects amount 

i not con- 
strued as a one-trial affair, then the 
effects of change in Magnitude must 


include a gradual change in level of 
Performance, 


The earliest study to use training 
and test phases, as here defined, Was 
Crespi’s (1942). In one of his experi- 
ments, after rats trained with either 
one or four incentive units had 


their mag- 
up to 16 
and 256- 


amount, the runnin 
animals reached the 
trained with 16 uni 


In the 1-to-16 
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Both of these overshooting effects 
were significantly different from ob- 
tained or extrapolated values for the 
16-unit rats, and were labeled ‘‘ela- 
tion” and “depression” effects, re- 
spectively. Later investigators have 
dubbed them “positive contrast” 
and “negative contrast” effects. Elli- 
ott (1928) had earlier found an effect 
similar to the negative contrast effect 
when rats trained in a multiple T 
maze with bran mash as incentive 
were shifted to sunflower seed. 
Crespi’s results indicate that the ef- 
fect of change in incentive magnitude 
is one of sudden change in perform- 
ance, and thus suggest that incentive 
magnitude affects level of perform- 
ance rather than amount learned. 

Zeaman (1949) also studied the ef- 
fect of sudden shifts in incentive mag- 
nitude upon runway behavior. 
Changes in latency were abrupt, oc- 
curring after but one exposure to the 
new incentive values. In one experi- 
ment, the change in latency was di- 
rectly proportional to the change in 
amount, and in another, based upon 
extrapolation, a positive contrast ef- 
fect was obtained, but a possible 
negative contrast effect was not sig- 
nificant. 

Another study from which infor- 
mation based on abruptness of per- 
formance change can be obtained is 
that of Spence (1956, pp. 130-132). 
Following 48 runway trials with 
either .05- or 1.0-gm. incentive, these 
values were interchanged. Again, 
changes in response strength were 
instantaneous. A significant nega- 
tive contrast effect was obtained, but 
not a positive contrast ‘effect. 

Metzger, Cotton, and Lewis (1957) 
employed a factorial design in which 
four groups of rats received either 
two or eight 45-mgm. pellets during 
the first and second 10 runway 
trials. Speed of running during the 
second (test-) phase varied only asa 


function of incentive magnitude dur- 
ing that phase. In assessing contrast 
effects, this study has the obvious ad- 
vantage over the others that com- 
parisons need not be made on the 
basis of extrapolation to what per- 
formance might have been had incen- 
tive shifts not occurred, for, accord- 
ing to the design, only half of the Ss 
experienced a change in incentive 
amount. Although changes in run- 
ning time occurred within one or two 
trials following incentive shift, there 
was no evidence for either positive 
or negative contrast effects. 

It is clear from these studies that 
quantitative variation in incentive 
affects level of performance rather 
than amount learned. But how are 
the contrast effects to be interpreted? 
Crespi and Zeaman gave their Ss 18- 
20 training trials before shifting in- 
centive values; Spence gave his Ss 
48 preshift trials. Both Crespi and 
Zeaman found positive contrast ef- 
fects but Spence did not. Spence 
argues that those found by Crespi 
and Zeaman were artifacts, due to 
the possibility that their rats had 
not had enough trials to reach a true 
asymptote before the shifts were 
introduced. His evidence, reported 
above, supports this interpretation. 
In other words, perhaps performance 
would have continued to improve 
anyway, without the increase in in- 
centive magnitude. However, that 
Metzger et al. gave only 10 preshift 
trials, fewer than any of the other ex- 
perimenters, and failed to demon- 
strate positive contrast effects, ar- 
gues against Spence’s position. Out 
of four studies (excluding the Elliott 
study) only two demonstrated each 
type of effect, and the one study 
which maintained control Ss not 
shifted yielded negative results for 
both types of effect. Before the mat- 
ter is resolved more information 1s 
needed on the conditions under which 
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contrast effects would and would not 
be expected to appear. Pereboom 
(1957b) has suggested that runway 
familiarity and competing explora- 
tory behavior are important variables 
in this connection. Two related pos- 
sibilities suggest themselves. First 
it should be noted that both Crespi 
and Metzger et al. employed meas- 
ures based on running time and that 
the runway used by Crespi, who ob- 
tained positive results, was five 
times as long as the one used by 
Metzger et al., who obtained negative 
results. Alley length may be an im- 
portant variable. Second, regarding 
the extent of preshift training, if ani- 
mals are given sufficient trials to 
bring them to the true “physiological 
limit,” it would be difficult if not im- 
possible for low-to-high Ss to exceed 
in performance animals already on 
the higher amount. 

One study involving the training- 
test paradigm used a learning situa- 
tion more complex than the runway. 
Maher and Wickens (1954) used a 


and either one or five pellets as in- 
No differences in rate of 
acquisition, in terms of errors, were 
found, but terminal speeds (Trials 
19-20) were greater in the five pellet 
group, findings consistent with those 
reviewed in the Preceding section. 
Following a lapse of 24 months, the 
maze was relearned under 22-hrs. 
water deprivation and a constant 
amount of water reward. During this 
relearning test, neithertime nor errors 
as a function of training-phase mag- 
nitude, were significantly different. 


Variation in Concentration of Sugar 
Solutions 

Following conditioning, extinction, 
and reconditioning, Guttman’s (1953) 


rats were tested under 1-min. PR for 
five daily 4-hr. sessions. Rate of re- 
sponding was an increasing loga- 
rithmic function of concentration. 
Note that in this study, the test 
phase involved, not a shift in incen- 
tive magnitude, but a shift from con- 
tinuous reinforcement to PR, and 
that the relation between magnitude 
and performance changed from non- 
monotonic to monotonic. 

Young and Shuford (1954) shifted 
the concentrations of their rats fol- 
lowing the 18 conditioning trials re- 
ported above. The general effect was 
a decrease in total time (latency plus 
running time) when concentration 
Was increased and an increase in 
total time when concentration was 
decreased. The former effect was 
more pronounced than the latter 
and this difference Was attributed to 
a practice effect. 

Dufort and Kimble (1956) gave 
rats 20 training trials in a runway 
with a 10% sugar solution in one of 
five bottle caps located on the goal 
platform, Following this, they were 
split and continued under either 0% 
(extinction), 5%, 10%, or 20% con- 
centration for an additional 49 trials, 


its percentage 
abruptly, while the Othe 
changed more gradually, 
and 10% groups continued to im- 
prove (indicating in the | 


$ € % groups 
declined in percentage correct. Dif. 


ferences in reciprocal running times 
were also assessed, and again the 
change in performance of the 20 o 
he other 
minal run- 
ve function 
rate of ap- 
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proach to these speeds was the same 
in all cases except for the 20% group, 
whose slope constant was greater 
than that of the other groups. This 
difference may be due to the fact that 
asymptotic performance had not 
been achieved before differential con- 
centrations were introduced. Dufort 
and Kimble conclude, “ ... the in- 
dication is that the effect of changing 
the amount of reinforcement is at 
least partly on performance rather 
than habit” (p. 190). We would con- 
clude that the effect is exclusively 
on performance. They also suggest, 
on the basis of the near identity of 
the 5% and 0% slopes, that extinc- 
tion be conceptualized as the limiting 
case of decreasing the amount of in- 
centive. 


RESISTANCE TO EXTINCTION 


Studies of the effect of incentive 
magnitude on resistance to extinction 
might profitably be grouped in terms 
of the extinction measure employed. 
Some investigators have used as 
their measure trials to an extinction 
criterion, while others have assessed 
performance over a constant number 
of extinction trials. One report, how- 
ever, does not give enough informa- 
tion for this classification. This was 
the study by Fitts (1940), who com- 
pared the extinction performance of 
rats following the 10 experimental 
conditions of 1, 5, 10, 20, or 30 re- 
warded bar-pressing responses with 
either .2- or 10.0-gm. incentive. For 
all numbers of rewards, resistance to 
extinction was greater following the 
larger of the two amounts. 


Trials to an Extinction Criterion. 


Three studies which employed a 
measure of trials to a specified ex- 
tinction criterion found unsystematic 
effects of prior incentive magnitude, 
while one found a systematic effect. 
Thus, Lawrence and Miller (1947) 
found an insignificant difference in 
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trials to running response extinction 
criteria of either 3- or 5-min. latency 
as a function of prior amount (one or 
four pellets). Reynolds (1950b) ad- 
ministered rewards to three groups 
of rats for 25 consecutive bar-pressing 
responses. The three groups received 
one 60-mgm. pellet, two such pellets, 
or one 160-mgm. pellet for each re- 
sponse. No significant differences 
were obtained between any of the 
groups in terms of responses to a no- 
response criterion of 5 min. In a re- 
lated study, Reynolds, Marx, and 
Henderson (1952) obtained no sig- 
nificant differences in trials to ex- 
tinction of a bar-pressing response 
following 120 vs. 30 mgm. reward. 

Young and Shuford (1955) extin- 
guished their Ss with distilled water 
following the 25 training trials previ- 
ously mentioned. The slope of the ex- 
tinction curves of running speeds 
varied directly with prior concentra- 
tion. Nevertheless, Ss with lower 
concentrations reached an extinction 
criterion sooner than those with 
higher concentrations, perhaps be- 
cause they had less far to go to reach 
this criterion. 


Performance over a Constant Number 
of Extinction Trials 


Zeaman (1949) subjected the Ss of 
his various experiments to extinction 
of the running response after they 
had completed their rewarded train- 
ing, either with or without test- 
phases intervening between training 
and extinction. In all three of his 
experiments, the effect of previous 
magnitude was to alter the rate of ap- 
proach to a final common perform- 
ance level. Gagné (1941) reported a 
higher terminal extinction level (after 
only five extinction trials) for greater 
reward amounts, confounded with 
longer intertrial intervals. Metzger 
et al. (1957) extinguished their Ss 
after the training and test phases con- 
sidered earlier. Using running times 
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during Extinction Trials 2-11, or Ex- 
tinction Trial 2 alone, training-phase 
magnitude had no effect. However, 
test-phase magnitude had a signifi- 
cant effect on both measures, greater 
resistance following the larger 
amount. 

Following PR under three levels 
each of quantity and quality, Hutt’s 
(1954) rats were given two 3-hr. ex- 
tinction sessions, Responses during 
extinction varied as a function of 
both variables, more responses being 
emitted following PR with larger 
amounts and preferred qualities. 
Guttman (1953) found rate of re- 
sponding during an initial S-min. of 


10-sec. drinking followed by 30-sec, 


learning effects are relatively per- 
manent while performance effects are 
momentary, we would be led to con- 
clude, on the basis of extinction per- 
formance differences found for a con- 
stant number of trials, that magni- 
tude of reward does affect learning 
when the absolute method is used, 


variance upon their extinction Scores, 
where performance levels at the be- 
ginning of extinction were equated 


in terms of running times during the 
last five reinforced trials. Under 
these circumstances, the effect of 
test-phase amount on extinction dis- 
appeared. They conclude, “.. . re- 
ward affects Performance on extinc- 
tion through differential levels of 
performance just Prior to extinction 
rather than affecting performance on 
extinction directly” (p. 188). If we 
apply this interpretation to the 
other studies in which performance 
was measured over a constant num- 
ber of extinction trials, the conclusion 
that incentive magnitude does not 
affect amount of learning is main- 
tained. This interpretation might 
also account for the apparently 
anomolous results of Young and 
Shuford (1955) and Fehrer (1956) on 
trials to extinction criterion, 


INTERACTION OF INCENTIVE 
AGNITUDE WITH OTHER 
VARIABLES 
Several investigators have manip- 
ulated incentive magnitude factori- 
ally with other variables. While 
their results agree with Previous con- 


tive magnitude with quality of re. 
ward, drive level, and Partial rein- 


Quality of Reward 


Hutt (1954) found that the effects 
of quantity and quality of reward on 
rate of bar Pressing during both PR 
and extinction were independent of 
each other, the interaction F ratio in 
the former case being less than one, 


Seward, Shea, and Elkind (1958) 
factorially varied incentive magni- 
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tude and length of food deprivation, 
and assessed their effect on running 
speed. Both main effects were sig- 
nificant, as was the interaction be- 
tween them. But this interaction is 
based on comparisons involving zero 
reward amount and zero hours dep- 
rivation. No learning took place 
under satiation or when the goal box 
was empty.: That the interaction 
might represent a special case found 
only when values of either incentive 
magnitude or length of deprivation 
are zero, as suggested by Seward et 
al., is attested by a recent study by 
Reynolds and Pavlik (1958). They 
varied incentive magnitude (.1, 1.0, 
and 2.0 gm.) and deprivation time 
(3, 22, and 44 hours) factorially over 
72 runway trials and reported re- 
ciprocal latencies over the last 20 of 
these trials. Differences as a func- 
tion of both incentive amount and 
deprivation were in the expected 
direction and significant. However, 
the interaction between these two 
variables was not significant (F 
<1.00). 

Reynolds et al. (1952), in contrast, 
reported a significant interaction be- 
tween amount of incentive and drive 
level on trials to an extinction crite- 
rion of 5-min. with no bar-pressing, 
such that high drive-high reward and 
low drive-low reward animals ex- 
tinguished more rapidly than high 
drive-low reward or low drive—high 
reward animals. Reynolds and Pav- 
lik suggest that this difference might 


- be a function of the different response 


measures employed in the two studies. 
Another possibility lies in the defini- 
tion of drive. For Reynolds and Pav- 
lik, drive was defined in terms of the 
length of food deprivation. In con- 
trast, Reynolds et al. manipulated 
both amount fed and deprivation 
time simultaneously. In any event, it 


- would seem that more research is 


needed to specify the conditions of 
both drive and reward under which 
an interaction between them would 
be expected. 


Partial Reinforcement 


Hulse (1958) found a greater run- 
ning speed over the last nine of 25 
daily runway trials when the incen- 
tive was a 1.0 gm. food pellet than 
when it was a .08 gm. pellet, and 
when reinforcement was continuous 
than when appearing on only 46% of 
the trials. He also measured running 
speeds over a constant number of ex- 
tinction trials and found a significant 
interaction between the effects of 
prior percentage of reinforcement 
and amount of reinforcement such 
that, with partial reinforcement, 
resistance to extinction was greater 
following a larger amount, but with 
continuous reinforcement, the re- 
verse effect obtained. 


MECHANISMS OF REINFORCEMENT 


In the introduction, four possible 
mechanisms of quantitative varia- 
tion were outlined: amount of nutri- 
ent material available for assimila- 
tion, preconsummatory stimulation, 
consummatory activity, and con- 
summatory stimulation. It was 
further indicated that different ex- 
perimental operations produce dif- 
ferent combinations of variation. 
Thus, decisions regarding mecha- 
nisms of reinforcement must involve 
comparisons of experiments utilizing 
the different operations. 

We are actually concerned here 
with the conditions of reinforcement, 
and a distinction must be made be- 
tween necessary and sufficient condi- 
tions. If quantitative variation is 
shown to affect behavior when one 
of the mechanisms is held constant, 
this would indicate that variation in 
that mechanism is not necessary for 
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incentive magnitude effects. On the 
other hand, lack of behavioral varia- 


mechanism is sufficient. But if there 
is no behavioral variation, it must be 
that that mechanism is not sufficient, 
or that a necessary variation (if there 
are any) is lacking. Conclusions re- 


Preconsummatory Stimulation 
That variati 


posure, or concentration of sugar or 
saccharine are manipulated. In all of 
these cases, Preconsummatory stim- 
ulation is Presumably constant, and 
yet behavioral variations, Previously 


McKelvey (1956) 
he duration of in- 


ameter), during 
black-white discrimination, Dura- 
tion of reward affected the Perform. 
ance measure, running time, but not 
the acquisition measure, errors, But 
by neither measure was preconsum- 
matory stimulation shown to have 
any systematic effect on behavior. 
This finding is corroborated in’ a 


study of delayed response perform- 
ance by chimpanzees using a modi- 
fied differential method, by Cowles 
and Nissen (1937). In one condition 
of their experiment, the chimps were 
shown either a large piece of orange 
without skin or a small piece with 
skin (the skin differences were used 
to equate eating times), but after 
the delay they always found the 
smaller. In the other condition, the 
chimps found the size they had been 
shown prior to the delay. Thus, with 
the first Procedure, Preconsumma- 
tory stimulation js varied but the 
amount remains constant; with the 
second procedure the two factors are 


Procedure, difference 


pected direction and, for the most 
Part, Statistically significant, 


volving the 


d a the magni- 
incentives which do not 


ilation is not a necessar 
reinforcement, A 


_ The pioneer study with saccharine 
incentives is that of Sheffield and 
his was essentially a 
€monstrational study, designed to 
determine if saccharine could act ef- 
fectively as a reinforcer, Although 
three experiments were conducted 
only the third will be described Heres 

or this experiment, rats learned a 
Position habit in a T maze, Food- 
deprived Ss were given 42 trials with 
saccharine solution in one arm of the 
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maze and tap water in the other. 
There developed a significant in- 
crease in choices of the saccharine 
side, as well as an increase in rate of 
ingestion and decrease in running 
time. 

The study of Hughes (1957), dis- 
cussed in an earlier section, indicates 
that, not only can rats learn in the 
absence of reinforcement by nutrient 
material, but that performance is dif- 
ferentially affected by saccharine 
concentration. 

Two studies using sexually moti- 
vated male rats as Ss with receptive 
females as incentive also indicate that 
behavior may be modified in the ab- 
sence of alterations in the effects of 
deprivation. Sheffield, Wulff, and 
Backer (1951) compared running 
speeds when the incentive was either 
a female rat in heat or another male. 
Even though ejaculation was not per- 
mitted, the Ss ran faster to the recep- 
tive female than to the male incen- 
tive. Kagan (1955) found a greater 
percentage of correct responses and 
faster running speed in a T maze 
when the incentive was copulation 
with ejaculation than when only in- 
tromission was permitted. Perform- 
ance under both of these conditions 
was in turn superior to that when 
only mounting was allowed. 

Several studies, already discussed, 
indicate that, when incentives are 
administered peripherally, variation 
in amount of nutrient material is not 
a sufficient mechanism of reinforce- 
ment. Kling (1956) found running 
speeds to be unrelated to volume of 
water consumed per trial, and 
Fehrer (1956) also found running 
speeds to be independent of amount 
of water reinforcement. Swisher and 
Czeh (Spence, 1956) found perform- 
ance to be unrelated to amount of 
food consumed. 

It should be emphasized that the 
conclusion that nutrient material, 


and hence that variations in amount 
of nutrient material is not a sufficient 
condition, applies only to situations 
in which incentives are administered 
peripherally. If we include evidence 
from experiments in which peripheral 
factors are by-passed, then the con- 
clusion must be that variation in nu- 
trient material is a sufficient condi- 
tion. See for example, the studies of 
Coppock and Chambers (1954), and 
Miller and Kessen (1952). But the 
remaining evidence will indicate that, 
when incentives are peripherally ad- 
ministered, the reinforcing mecha- 
nism is itself peripheral. 


Amount of Consummatory Activity 


When incentive magnitude is var- 
ied by manipulating the concentra- 
tion of sucrose solutions, the amount 
of consummatory activity is presum- 
ably held constant. The studies re- 
viewed earlier indicate that perform- 
ance does vary as a function of ‘su- 
crose concentration, so it would seem 
that variation in amount of consum- 
matory activity is not a necessary 
mechanism. 

However, variation in consumma- 
tory activity does seem to be a suffi- 
cient condition. A number of studies 
in which ingestion rate was measured 
would so indicate. For example, Shef- 
field, Roby, and Campbell (1954) 
found a high positive correlation be- 
tween running speed and ingestion 
rate (incentives were water, sac- 
charine, dextrose, or dextrose plus 
saccharine solutions), direct evidence 
that performance is an increasing 
function of amount of consummatory 
activity. The Kling (1956) study also 
indicates that the important consum- 
matory activity variable is ingestion 
rate. 

Supporting evidence comes from 
the finding of Sheffield, Wulff, and 
Backer (1951) of a positive relation- 
ship between running speed and per- 
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centage of opportunities to attempt 
copulation during which the attempt 
was actually made. 


Amount of Consummatory Stimulation 


When duration of incentive ex- 
posure is manipulated, the nature 
of the consummatory stimulation is 
held constant (although its duration 
may vary). Several studies of this 
type (e.g., Kling, 1956; Spence, 1956) 
have found performance variations, 
suggesting that variations in con- 
summatory stimulation are not nec- 
essary. 

That they are sufficient, however, is 
indicated in a study by Cockrell 
(1952), similar in design to Gutt- 
man’s (1953), but manipulating con- 
centration of saccharine rather than 
of sucrose solutions. By his pro- 
cedure, the only mechanism assumed 
to vary was consummatory stimula- 
tion (taste). During conditioning 
under continuous reinforcement, bar- 
pressing rates reached nonmonotonic 
asymptotes as a function of concen- 
tration. During extinction and PR, 
rate of responding was approximately 
a linear function of the logarithm of 
saccharine concentration. 

We may close this section with the 
tentative conclusion that none of the 
mechanisms of reinforcement are nec- 
essary for bringing out performance 
differences as a function of incentive 
magnitude, but that variations in 
either amount of consummatory ac- 
tivity or stimulation associated with 
that activity are sufficient conditions. 
By way of qualification, it should be 
pointed out that there is some diffi- 
culty in manipulating consummatory 
stimulation while holding consumma- 
tory activity constant (cf. Guttman, 
1953). Thus it may be that only the 
concurrent variation in consumma- 
tory activity and consummatory 
stimulation will prove to be sufficient. 
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THE COMPARISON OF INCENTIVE 
MAGNITUDES: THE DIFFEREN- 
TIAL METHOD 


The term differential, as used here, 
will be given a somewhat broader 
meaning than that originally assigned 
by Lawson (1957). Several differen- 
tial procedures may be distinguished: 

1. Single Problem Method— 
Throughout the experiment there is 
but a single task or problem before 
the S, but in different stages of the 
experiment he receives different 
amounts, for example, according to a 
latin square design. 

2. Simultaneous Discrimination— 
Selective learning is involved such 
that one of two responses is followed 
by a greater amount, the other by a 
lesser amount. S must learn to dis- 
criminate magnitudes. 

3. Successive Discrimination—On 
successive trials, S makes one re- 
sponse followed by a larger amount, 
or a second response followed by a 
smaller amount. 

4. Successive Problems—A series 
of problems are to be learned, each 
problem correlated with a different 
incentive amount. The “learning 
sets” paradigm (Harlow, 1949) pro- 
vides a familiar example. 

It will be noted that in some cases 
the distinction between absolute and 
differential methods will appear arbi- 
trary. Certainly the previously dis- 
cussed training-test-phase design un- 
der the absolute method involves ex- 
posure by the same S to more than 
one incentive amount. However, 
separate measurements were made 
under each phase of the experiment, 
and statistical analyses were sepa- 
rate. In contrast, studies to be re- 
viewed in this section are ones jn 
which a single analysis is performed 
on data obtained under variable in- 
centive conditions for the same |S, 
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Single Problem Method 

Gantt (1938; also reported in Hull, 
1943, p. 125) presented a curve of 
asymptotic (?) conditioned saliva- 
tion in a single dog as a function of in- 
centive magnitude—}, 1, 2, and 12 
grams of food. The curve represents 
an increasing, negatively accelerated 
function. It is interesting to note 
that this is apparently the only study 
of incentive magnitude in the litera- 
ture based upon the classical condi- 
tioning procedure. 

Nissen and Elder (1935) varied 
both the size of a single piece of ba- 
nana and the number of constant- 
weight pieces and assessed the effect 
on the limits of delayed response at 
an accuracy of 80% or better. The 
total number of grams varied from 3 
to 20, and increases in delay limits 
accompanied increases in incentive 
amount, and the reverse. A persev- 
eration effect—the opposite of con- 
trast effects—was also noted. A sec- 
ond delayed response study (Cowles 
& Nissen, 1937) has previously been 
cited. They found fewer errors and 
shorter latencies with large than with 
small incentives. 

Fletcher (1940) assessed the effects 
of incentive magnitude (length of 
banana slice) on the performance of 
pulling responses by chimpanzees 
and found a positive relationship be- 
tween frequency of response and in- 
centive length. In addition, various 
time measures were significantly af- 
fected, among them, response la- 
tency, pulling time, and the number 
of tugs per trial. 

At this point, Tinklepaugh’s early 
study (1928) might be mentioned. 
He noted in reward substitution ex- 
periments that when monkeys ob- 
served the experimenter place two 
pieces of food under a cup, but later 
found only one piece, certain emo- 
tional and searching responses en- 


sued. This behavior did not occur 
when one small piece was substituted 
for one large piece. 

Michels (1957) used a latin square 
design to assess the effects of amount 
of reinforcement (.5, 1, 2, or 4 pea- 
nuts) on latency of response to a 
single test object. The WGTA was 
used and the rhesus monkeys who 
served as Ss were on a 20% partial 
reinforcement schedule throughout. 
Latencies with the .5-peanut reward 
were significantly greater than with 
the three larger rewards, which did 
not differ reliably from each other. 

The remaining studies utilizing 
the single problem method are all 
operant conditioning studies. Jenkins 
and Clayton (1949) allowed a group 
of pigeons either 2-sec. or 5-sec. ex- 
posure time to the food magazine in 
a counterbalanced order during PR 
and found a faster rate of key-peck- 
ing with the longer duration. Roughly 
twice as many consummatory re- 
sponses were emitted during the 5- 
sec. exposure than during the 2-sec. 
exposure. 

Following the training previously 
described, 20 of Guttman’s (1953) 
rats were given further PR training, 
each S under each of the four sucrose 
concentrations used. A linear rela- 
tionship was found between rate of 
responding and the logarithm of con- 
centration, as had been found with in- 
dependent groups under PR. Ina 
second study, Guttman (1954) com- 
pared the reward values of various 
sucrose and glucose concentrations, 
varied in equal logarithmic steps 
from 2% to 32%. For both sucrose 
and glucose, the relationship be- 
tween rate of bar pressing under an 
aperiodic schedule, and log concen- 
tration, was linear. Throughout, 
rate was higher with sucrose than 
with the corresponding glucose value. 
Two latin squares were used such 
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that any S experienced all concentra- 
tions of one substance but not the 
other. 

Conrad and Sidman (1956) inves- 
tigated the effect of sucrose solution 
concentration on bar pressing by 
rhesus monkeys. Each S experienced 
each of seven concentrations, rang- 
ing from 0% to 60%. The maximum 
rate occurred between 15% and 
30%, with a decline at 60%. Ver- 
have (1956) assessed the effects of 
seven different sucrose concentra- 
tions presented to the same rats on 
response latency and rate for a two- 
member chain (pulling responses). 
Concentrations varied from 2% to 
32%. Response rate increased and 
latency decreased as a function of 
concentration, both functions reach- 
ing their asymptote in the neighbor- 
hood of 20-21% concentration. 

Collier and Siskel (1959) and Col- 
lier (1958) have attempted an ex- 
amination of the factors producing 
nonmonotonic incentive functions, 
which have been found with either 
the absolute or the differential 
method. Collier and Siskel varied 
sucrose concentration (4%, 8%, 16%, 
and 32%) and PR interval (.5-, 1-, 2-, 
and 4-min.) factorially and assessed 
their effects on bar-pressing rate in 
rats. Both main effects, as well as 
their interaction, were significant. 
The nature of the interaction was 
such that the nonmonotonicity of the 
obtained concentration function was 
itself a decreasing function of the 
inter-reinforcement interval. Collier 
(1958) discusses these and other data 
and concludes that a nonmonotonic 
relationship will most likely be found 
under any of the following condi- 
tions: an extended test-session, large 
volume (Hughes’ [1957] study veri- 
fies this assertion), high concentra- 
tion, or short time between reinforce- 
ments. 


Simultaneous Discrimination 

Festinger (1943) allowed rats 10- 
sec. exposure to whole wheat grain in 
one arm of a discrimination box and 
1-sec. exposure in the other arm. By 
the end of 96 trials, the rats were re- 
sponding on free-choice trials to the 
side with the greater exposure time 
at an above-chance level. Denny and 
King (1955) replicated the study 
with .7-gm. reward in one goal box of 
a T maze and .1-gm. in the other. By 
the end of 84 training trials, a pref- 
erence for the larger reward side had 
developed. Discrimination training 
was followed by reversal training, 
with the large- and small-reward 
sides interchanged. By the end of 72 
reversal trials the Ss were running to 
the opposite side with a frequency 
greater than chance. 

Pereboom (1957a) gave rats two 
free- and two forced-choice trials per 
day in a T maze for either 10 or 21 
days. Five food pellets were found 
on one side, one on the other, Follow- 
ing this training, the large and small 
rewards had their sides interchanged 
until a total of 35 days had been com- 
pleted. Asa third stage of the experi- 
ment, for an additional 12 days the 
reward amounts were equalized on 
the two sides. By the end of the 
initial 10 training days, all Ss were 
responding consistently to the large 
reward side. Reversal, as in the 
Denny and King study, was gradual, 
but complete by the end of reversal 
training. During the subsequent re- 
ward-equalization phase, the Ss 
tended toward chance performance, 
The gradual reversal found by both 
Denny and King, and Pereboom, is 
to be contrasted with the abrupt per- 
formance changes found when incen- 
tive magnitudes are altered in the 
runway. Pereboom interprets this 
inconsistency in terms of competing 
exploratory behavior which retards 
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reversal learning, and would be ex- 
pected to be greater in the more com- 
plex T maze than in the runway. 

Wike and Barrientos (1957) held 
amount of nutrient material and 
consummatory stimulation constant 
while allowing only duration of con- 
summatory activity to vary. This 
was achieved by varying the di- 
ameter of a drinking tube from which 
the water incentive (for thirsty rats) 
was to be obtained. With the smaller 
diameter more consummatory ac- 
tivity was required to attain a con- 
stant amount of water. The two di- 
ameters were pitted against each 
other in opposite arms of a T maze 
and the rats were required to learn a 
position discrimination on this basis. 
By the end of the 27 daily trials 
given, 85% of responses were to the 
side with the smaller diameter, a 
level significantly above chance. 
These results nicely confirm an ear- 
lier conclusion that variations in 
neither amount of nutrient material 
nor consummatory stimulation are 
necessary, but variation in consum- 
matory activity is a sufficient mecha- 
nism. 


Successive Discrimination 


D'Amato (1955) gave rats training 
on a successive discrimination prob- 
lem in a runway such that on half 
the trials a larger incentive was asso- 
ciated with a goal box of one color 
while on the other half a smaller in- 
centive was associated with a dif- 
ferently colored goal box. By the end 
of 70 training trials, the Ss were run- 
ning significantly faster for the 
larger reward than for the smaller. 

Greene (1953) employed, a design 
utilizing successive discrimination 

uring a training-p ; 
maA discrimination during à test- 
phase. By a factorial design it was 
possible to assess learning VS. per- 


hase and simul- 
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formance effects. During the train- 
ing-phase, rats found either a large 
pellet in a black goal box and a small 
pellet in a white one, or the reverse. 
During the test-phase, either a large 
or a small reward was found in the 
black arm of the apparatus, the 
white arm being empty. Number of 
errors and number of trials to crite- 
rion during the test-phase were sig- 
nificantly less for the animals having 
the larger reward associated with 
black during the training-phase, but 
performance did not vary systemati- 
cally as a function of contemporary 
amount. This result is flatly counter 
to test-phase results reported earlier 
using the absolute method. Here we 
have evidence for a persistent effect 
of training-phase amount on later 
performance, which can be inter- 
preted to mean that, with the dif- 
ferential method, amount of incen- 
tive affects learning. This conclusion 
will be verified in subsequently dis- 
cussed papers. 

Powell and Perkins (1957) manipu- 
lated duration of incentive exposure 
during successive discrimination 
training and found it to influence 
simultaneous discrimination test- 
phase performance, confirming 
Greene’s results. 


Successive Problems 


Experiments in which Ss are 
trained on a number of consecutive 
problems, with different incentive 
magnitudes associated with different 
problems, yield clear-cut evidence 
that incentive magnitude affects 
learning. 

The first, and probably the most 
significant, of these experiments was 
published by Meyer in 1951. Meyer 
trained eight rhesus monkeys, highly 
sophisticated on several tasks includ- 
ing discrimination reversal, on 4 
series of 64 different discrimination 
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reversal problems in the WGTA. 
Each problem used a single pair of 
objects throughout an original learn- 
ing phase and four reversals 
(ABABA). Response to the correct 
object yielded a constant amount of 
incentive for any given reversal but 
varying amounts over the four re- 
versals of that problem. Statistical 
analysis was of reversal errors as a 
function of the following factors: 
practice, or number of prior prob- 
lems; reversal number within a prob- 
lem; prereversal reward (one or three 
pieces of raisin or peanut); postre- 
versal reward (one or three incentive 
units); prereversal criterion (two or 
four successive correct responses); 
and postreversal criterion (same). 
The critical comparisons for the pres- 
ent context are those involving pre- 
and postreversal amounts, and stage 
of practice. 

Both pre- and postreversal amount 
affected reversal performance, and 
the first-order interactions between 
pre- and postreversal amount, and 
between postreversal amount and 
Practice, were significant. What this 
means is that both prereversal 
amount (training-phase amount) and 
postreversal (test-phase 
per- 
The learning of discrim- 


experiment with a considerable 
amount of reversal learning experi- 


nges in per- 


Were 1,402, on 
- So it would 


independently of learning. The sig- 
nificant interactions imply that the 
effects of pre- and Postreversal 
amounts are not independent of each 


other. However, this nonindepend- 
ence was apparent on early problems 
only. By the termination of the ex- 
periment, the four treatment com- 
binations of pre- and postreversal 
amounts were ordered in their effect, 
most to least errors, as follows: 3-1, 
1-1, 3-3, and 1-3. This ordering indi- 
cates possible long-lasting contrast 
effects, both positive and negative, 
that are slow to develop. The meas- 
ure Meyer employed was errors to 
criterion, a measure we have taken to 
represent effects on learning. Thus, 
the results of the experiment taken 
as a whole suggest the following con- 
clusion: Incentive magnitude affects 
learning when Ss experience different 
incentive magnitudes, but the Ss 
must first learn to discriminate mag- 
nitudes. Once Magnitude discrim- 
ination is achieved, differential ef- 
fects on problem learning emerge. 
his conclusion receives support 
from a more recent study by Schrier 
and Harlow (1956), They used color 
discrimination problems to assess the 
effects of incentive amount (one, two, 
or four 2.2-gm, Pellets), problem dif- 
ficulty, and Practice, on a learning 
measure, percentage of correct re. 
sponses, in Java monkeys. All main 
effects were Significant, as was the 
Practice by amount interaction, This 
reflects the finding that performance 
begins at about the same level for all 
amounts, but subsequently the learn- 
Ing-set curves diverge, so that dif- 
ferent asymptotes are approached. 
chrier and Harlow say, ‘“‘...thata 
learning process is involved which in- 
fluences the perception of, and in 
turn, the response to, varied amounts 
of incentive, and that such learning is 
independent of discrimination learn- 
ing per se” (p. 120). 
Meyer and Harlow (1952) investi- 
gated the effects of incentive amount 
1, 2, 3, or 4 units) on delayed re- 


Sponse performance of thesus mon- 
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keys, using the WGTA. Percentage 
of errors decreased with increasing in- 
centive amounts, and the incentive 
function changed with practice from 
one with slight positive acceleration 
to negative acceleration. 

Davis (1956) reported results of 
two experiments on problem solving 
in monkeys. He found that an in- 
crease in incentive amount led to an 
increase in balks (i.e., decrease in per- 
centage of responses), but no change 
in errors, on visual size discrimina- 
tion problems. The unexpected effect 
on balks was apparently due to satia- 
tion effects with the larger amounts. 
In a related study, a reduced-cue 
problem, three raisins produced fewer 
Trial 2 errors than 1, but there were 
no significant differences in the num- 
ber of balks. 

Finally, Leary (1958) compared 
four conditions of reward in their ef- 
fect on serial discrimination learning 
in rhesus monkeys. The four condi- 
tions, orthogonal to four 10-pair lists 
in a greco-latin square, were: two pea- 
nuts for all pairs of a list; one-half 
peanut for all pairs; two peanuts for 
five pairs, one-half for the other five; 
and two peanuts for two pairs, one- 
half peanut for the remaining eight. 
The two homogeneous reward condi- 
tions led to performance superior to 
that with heterogeneous reward, the 
difference attributable primarily to 
differences in errors on small reward 
pairs under the two conditions. 


Paired-Comparisons Studies 


Two studies, although employing 
variations on the differential method, 
do not fit into any of the above cate- 
gories. These are paired-comparisons 
studies of food preference as a func- 
tion of magnitude. Hanow an 
Meyer (1952) gave seven Day 
heen Rad 400 WGTA trials in 
which they were to choose between 


two peanut amounts: hh 1 8084 


peanuts. This gave 10 pairs. Result- 
ing percentage choice scores were 
converted to scale values by a 
method similar to Thurstone’s Case 
III, resulting in an increasing linear 
relationship between scale value and 
log incentive amount. Fay, Miller, 
and Harlow (1953) had their mon- 
keys choose between a preferred qual- 
ity (peanut or bread) and a nonpre- 
ferred quality (potato), when each of 
five values of the preferred was paired 
with each of two values of the non- 
preferred. They found that, as the 
ratio of preferred to nonpreferred 
amount increased (from 1/64 to 1/2), 
the choices of the preferred food in- 
creased. Further, the percentage 
choice depended, not upon absolute 
amounts compared, but upon their 
ratios. 


Comparison of Absolute and Diferen- 
tial Methods 


Several studies afford a more direct 
comparison of the absolute and dif- 
ferential methods than has been pos- 
sible in studies reviewed so far, in 
that the two methods are compared 
within a single experimental design. 

Logan, Beier, and Ellis (1955) com- 
pared runway speeds during acquisi- 
tion when the incentive was nine 45- 
mgm. pellets for one group of rats, 
five such pellets for a second group, 
and, for a third group, nine pellets on 
a random half of the trials and one on 
the other half. Thus, the first two 
groups were trained according to the 
absolute method, the third according 
to the differential. Over-all running 
speed (60 daily trials) was greater 
for the first group than for either of 
the other two. Inspection of their 
Fig. 2 (p. 264) indicates similar rates 
of approach to terminal speeds in all 
cases. 

In a follow-up study, Logan, Beier, 
and Kincaid (1956) report the extinc- 
tion results of this and a related ex- 
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periment. For the first experiment, 
none of the three groups differed in 
their relative rates of extinction. In 
the related experiment, extinction 
followed 60 acquisition trials with 
nine pellets on each trial, nine vs. 
zero each on a random half, or nine 
vs. one. In this case the two varied- 
amount (differential method) groups 
were superior to the constant-amount 
group. The conclusion would seem 
to be that relative rate of extinction 
is retarded by Previous comparison 
of incentive values. 

Lawson (1957) assessed the effects 
of incentive magnitude on discrim- 
ination learning by having rats learn 
two problems concurrently, a black- 


gray discrimination, gray being nega- 
For some Ss 


of errors over a constant number of 
In contrast, the differential 
groups showed significantly fewer er- 


Two studies similar in design to 
Lawson’s are reported by Schrier 


only one of the four magnitudes, In 
the first study (1956b), effects w 
measured on the latency of respo 
to a single object covering a baited 
food well. By either method, per- 


formance improved as incentive 
amount increased. However, the ab- 
solute and differential methods did 
not produce differences in over-all 
performance, and the slopes of the 
functions relating incentive magni- 
tude to latency did not differ signifi- 
cantly. In the second study (1958), 
effects on errors during a series of 
discrimination problems were as- 
sessed. Again, the two groups did 
not differ in over-all performance. 
However, the slope of the incentive 
function was significantly greater for 
the differential method than for the 
absolute method. 


ACQUIRED REWARD VALUE AS A 
FUNCTION oF INCENTIVE 
MAGNITUDE 


Experimental studies of secondary 
reinforcement were recently reviewed 
by Myers (1958). On the basis of the 


1955; Hopkins, 1955; Lawson, 1953), 
Myers concluded that, “i 
fect of amount of reward is So slight, 
that it can only be detected when the 
S is forced to choose between sec- 
ondary reinforcers Previously asso- 
ciated with different sized rewards” 
ding was that an ef- 
fect appeared only when training was 
by the differentia] method, but not 


wo studies have appeared since 
yers review, Ip the first of these 


secondary reward strength. By 
neither method was a differential ef- 
reward amount on 
secondary reward strength demon- 
strated. 

However, a study by Butter and 
Thomas (1958), using the absolute 
method, indicates that perhaps My- 
ers’ conclusion should be revised, 
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During training, click was paired 
with the presentation of either 8% 
or 16% sucrose solution, and during 
testing, bar pressing produced the 
click. The 24% group emitted sig- 
nificantly more responses during the 
test than did the 8% group, indicat- 
ing a differential effect of magnitude 
of primary reward on the strength of 
secondary reward. Butter and 
Thomas suggest that the earlier fail- 
ures with the absolute method might 
be due to the possibility that the 
primary amounts were too large, to- 
ward the asymptote of the primary 
reward amount function. 

Actually, it does not look as if any 
clear-cut conclusions can be reached 
until more research has been com- 
pleted, for example using a design 
similar to Lawson’s (1957), allowing 
direct comparison of the absolute 
and differential methods. Perhaps 
the safest statement that can now be 
made is the empirical generalization 
that amount of primary reward does 
affect strength of secondary reward, 
and that this effect is more likely to 
be evidenced if the differential method 
of inceritive presentation is used. 

Finally, this section might be con- 
cluded by mention of Wolfe’s original 
experiments on token-reward (1936). 
Wolfe used the differential method in 
which selection of a blue token 
yielded two grapes, a white token 
yielded one grape, and a brass token 
yielded nothing. Three chimpanzees 
learned to consistently choose the 
blue token, demonstrating that sec- 
ondary reward values can be dis- 
criminated, just as primary reward 


values can. 
SUMMARY AND CONCLUSIONS 


Perhaps the most significant find- 
ing of this review is the ey 
high degree of consistency of results. 
The facts seem well ordered, and 


where contradictory results are 
found, such disagreements can usu- 
ally be fairly readily explained in 
terms of procedural differences. The 
empirical laws are now before the 
theorist, for him to explain in what- 
ever logically consistent way he can 
devise. Because the findings seem so 
well ordered, he should perhaps have 
less trouble than in other aspects of 
behavior theory. 

We may conclude by bringing to- 
gether the various empirical gen- 
eralizations scattered throughout this 
paper, and offering some suggestions 
for further research. The following 
generalizations have emerged from 
the review of the evidence on incen- 
tive magnitude: 

1. With the absolute method, 
quantitative variation in incentives 
has no apparent effect on rate of 
learning. A possible exception in- 
volves the manipulation of concen- 
tration of sugar solutions. A second 
exception is that learning is more 
rapid with any positive incentive 
amount than with zero amount. 

2. Asymptotic performance is an 
increasing function of incentive mag- 
nitude. The function is negatively 
accelerated, and possibly logarithmic, 
although under some circumstances, 
nonmonotonicity appears. 

3. These asymptotic differences re- 
flect a direct effect on the level of per- 
formance, rather than an indirect ef- 
fect based on differences in amount 
learned. 

4. Magnitude of reward affects re- 
sistance to extinction indirectly 
through differences in terminal level 
of rewarded performance. 

5. There exists conflicting evidence 
on the interaction of incentive mag- 
nitude with drive level. 

6. None of the mechanisms whereby 
behavioral results of quantitative 
variation are effected are necessary. 
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7. Sufficient mechanisms seem to 
be amount of consummatory activity 
and stimulation from the incentive 
associated with consummatory ac- 
tivity. In the former case, it is the 
rate of consummatory activity which 
seems paramount. (This set of con- 
clusions is based upon exclusion from 
consideration of administration of 
nutritive substances other than pe- 
ripherally.) 

8. With the differential method, 
incentive magnitude affects learning, 
but only after Ss have learned to dis- 
criminate amounts, 

9. The asymptotic incentive func- 
tion is steeper when training is by the 
differential method than by the abso- 
lute method, 

10. Acquired reward value is an 
increasing function of Magnitude of 
Primary reward. 

To these conclusions might be 
added a comment on the use of time- 
dependent measures as measures of 


nitude affect 
learning. An 
dependent measures that were af- 
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The primary task of clinical diag- 
nosis is that of collecting, evaluating, 
and assimilating information with 
respect to the patient. The starting 
point is the information itself; this 
may be in the form of laboratory 
test results, biographical data, scores 
on psychological tests, manifest 
symptoms, or other observables. The 
end result is a judgment; this may 
take the form of a recommendation 
concerning treatment or discharge, 
a decision that certain other data are 
necessary before final judgment is 
made, or a classification of the pa- 
tient into a diagnostic category. 
What intervenes between beginning 
and end is, for each clinician, a quite 
complex idiosyncratic process. It is 
the purpose of this paper to demon- 
strate that the process is capable of 


rigorous investigation and descrip- 
tion. 


THE MENTAL PROCESS 


In dealing with the manner in 
which clinicians utilize information 
at their disposal to arrive at judg- 
ments or decisions, it may appear 
that investigations would be con- 
cerned primarily with mental proc- 
esses; and since mental processes 
have often been equated to or placed 
within the realm of subjective ex- 
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perience, it would be well to make one 
or two observations for purposes of 
clarification. In the first place, the 
term mental process is often directly 
equated with subjective experience. 
But as private experience, the men- 
tal process is not observable. Hence, 
acceptance of this definition places 
the process beyond the realm of 
legitimate scientific inquiry, except 
as it may be inferred from observ- 
able phenomena such as verbal re- 
sponses. And since no criterion can 
exist for the validation of inferences 
concerning subjective experience, the 
inferences are simply ways of find- 
ing agreement in the use of language 
or other symbolic responses between 
the subject and the observer. If an 
observer makes “good” inferences 
concerning a subject, this means at 
most that a consensus exists be- 
tween them with respect to the sym- 
bolic behavior involved. Under- 
standing of the mental process qua 
subjective experience can never go 
beyond this level. 

On the other hand, mental process 
may be alternately defined. It may 
be considered as a physical (e.g., 
neurological, biochemical) event ca- 
pable of direct observation, i.e., using 
electrophysiological, neurophysio- 
logical, and similar techniques. To 
be sure, these techniques have so far 
yielded little that satisfactorily de- 
scribes cognitive mental functioning, 
and this is perhaps unfortunate. It 
does not follow that the approach is 
sterile. Improvement in the tech- 
niques of measurement and in the 
application of more explanatory mod- 
els may ultimately result in great 
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progress, even though this may seem 
remote at the present time. This sec- 
ond definition is not unreasonable by 
any known standard, and it may 
surely encourage productive research 
and a resulting clarification of basic 
issues. But it is perhaps too early to 
say. . 

This brings us to the third sense 
in which the term mental process 
may be employed. It should first be 
pointed out that any realm of sci- 
entific investigation is designed to 
provide, among other things, a use- 
ful level of objective description. 
Direct observation, testing, instru- 
mentation, and other related tech- 
niques are steps in this direction. 
When properly employed within a 
theoretical framework they seek to 
describe relationships between events 
or phenomena. The problem of de- 
scribing judgment can similarly be 
considered to be one which inter- 
poses a set of techniques and a the- 
oretical system between two sets of 
observables. Thus it is possible to 
“describe” the kinds of mental ac- 
tivity usually characterized as cogni- 
tive by means of mathematical 
models. One may thereby approach a 
level of description which is at least 
equal to that of other competitors in 
some respects, and certainly superior 
in other ways. That is to say, in con- 
trolled situations wherein the input 
(information) and the output (judg- 
ment) are known or capable of quan- 
tification, one may postulate func- 
tional relationships between input 
and output and assess their ade- 
quacy by determining the accuracy 
with which each is capable of pre- 
dicting judgment. The present paper 
is directed at this level of description. 
The term mental process, refers 
simply to a functional relationship 
which accounts for consistencies in 
response to divergent simie = 
formation) patterns. It is thus a 
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of intervening variables, nothing 
more. 

A question which immediately 


arises from the foregoing discussion 
is that of the adequacy with which 
it is possible to describe the mental 
processes underlying clinical judg- 
ment. In answer it may be said that 
the process is adequately described 
when a particular mathematical 
model quite effectively predicts judg- 
ments for any given set of informa- 
tion. This is consistent with the 
scientific meaning of the word ‘‘de- 
scription,” although considerations 
such as simplicity, generality, and 
the testability of derivations must 
be kept in mind. A major problem 
in the understanding of judgment 
will in this paper be considered to be 
that of formulating the kind of 
model which is sufficiently predic- 
tive, yet useful as a vehicle for ap- 
proaching other related problems in 
the area of judgment. Different 
kinds of models need to be dis- 
cussed, compared, and evaluated, 
and empirical findings are of course 
necessary. Subsequent sections of 
this paper will consider specific 
models for judgment. One, the 
linear model, is relatively simple; 
another to be described is somewhat 
more complex. Following this, some 
empirical findings will be offered as 
illustrative of the research oppor- 
tunities which unfold. 

Before beginning the discussion 
of specific models, however, it be- 
comes necessary to justify certain 
restrictions that must be imposed 
in order that meaningful inquiry 
may be made into the judgment 
process. The restrictions center in 
the nature of the information avail- 
able to the judge or clinician and 
upon which the judgment is con- 
tingent. The restrictions do not 
seriously impair the realism of the 
judgment situation as long as one 
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ing some ingenuity to bear 
pe, a problems of quantifica- 
tion, but even this point of view 
may be objectionable to some. For 
completeness, therefore, and in order 
to provide early insight into the ex- 
perimental procedures used in the 
empirical studies to be later de- 
scribed, attention may now be di- 
rected to the problem of the nature 
of the information available to the 
judge. 


THE INFORMATION 


The information upon which clini- 
cal assessment is based may com- 
monly be expected to include any- 
thing or everything, depending upon 
the training and inclination of the 
diagnostician, and depending upon 
what is available or easily obtain- 
able. The lack of control over such 
information may be considered an 
asset or a liability, depending upon 
one’s orientation, and the accuracy 
of judgment may or may not be en- 
hanced through the inclusion of non- 
quantitative data (Ullmann & Berk- 
man, 1959; Holt, 1958; Luft, 1950; 
Meehl, 1954) depending upon the 
judgment domain or situation. 

What seems certain, regardless of 


the outcomes of empirical research, 


is that the uncontrolled use of clinical 
data 


» whether or not it exists in 
quantitative form, makes clinical 
assessment an artistic venture. It 
must of course remain a matter of 
one’s values as to whether this 
wise, What seems equally certain is 
that any seriously conducted scien- 
tific study of judgment which has as 
its purpose the description of the 
method of combination used by the 
judge must take place in controlled 
settings, i.e., in such a way that the 
amount, kind, and nature of the in- 
formation available to the clinician 
or judge can be completely specified 
in objective terms. 


is 
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Controlling the judgment task to 
this degree has its advantages and 
its liabilities. On the one hand, re- 
stricting the situation as described 
assures that each person is evaluated 
with respect to the same informa- 
tion. Ambiguous and equivocal cues 
are removed, and all judges are 
thereby certain to have at their dis- 
posal the same information and no 
more. The inferences made beyond 
this point are thus certain to have 
their origins in the data provided. 
The major problem, that of describ- 
ing the idiosyncratic method of com- 
bination and weighting of this in- 
formation by the clinician, is there- 
by clearly defined. Clinical judg- 
ments are, of course, often made in 
settings wherein the kinds of infor- 
mation available may include inter- 
view and projective test 
sions, etc. In addition, the 
information available may v: 
siderably from one patien 
next. This may be said t 
limitation to the situation 
Such information may be 
in judgments, a point pe 
made by Holt (1958), b 
tured clinical judgmen 
nonetheless make the 
of such information ex 
impossible to assess, 
here under investigatio 
result, cease to be as 
lem altogether, 


à Let us therefore consider that the 
situation in which a clinician makes 
evaluations of patients is restricted 
in the following Ways: (a) the in- 


educed to a 
pect to which * 


r ple are evalu- 
information ; 
pressed in numbers or jn categorical 
responses; 


satisfies as 


impres- 
kind of 
ary. con- 
t to the 
Oo pose a 
described. 
important 
rhaps best 
ut unstruc- 
t situations 
contribution 
perimentally 
The problem 
n would, as a 
cientific prob- 


cations for each patient, where each 
number or class represents the de- 
gree to which a characteristic, trait, 
symptom, or biographical factor’ is 
present. One example of the situa- 
tion satisfying these restrictions is 
‘€ that of a symptom check list; an- 
< other is that of a test profile. Rating 
scale data are likewise permissible, as 
would be combinations of these 
fe" types, | 
E: Having objectified the data upon 
se which the judgments are to be 
based, we may now turn to a consid- 
-* eration of the model. 


` THE LINEAR MODEL 


į Sve linear model is one in which 
J ents are descrjbed as a simple 
weighted sum of the yalues of the 
. information available. For a given 
clinician judging a number of people, 
we let J represent the judgment and 
consider it as a dependent variable. 
The dimensions of information are 
designated by Xs. These will, of 
course, be independent: variables. 
If there are k sources of information, 
the linear additive model can be de- 
stribed as follows: * 


i J=f(X;) 
ne FoR ccs 


Since wevare interested in a weighted 
sum of the X; we may write 


J=Aot AX + AaXet ae PAX. 


If the A; are so chosen as to yield the 

best possible weighted sum, i.e., so 

_that the composite scores correlate 

`a “maximally with J, the model is 
equivalent toʻa linear multiple re- 
gression equation wherein the 
weights to be applied to the inde- 
pendent variables are so chosen as to 
minimize the error in estimating an 
actual dependent variable from the 
weighted composite. J 

$ Application of multiple regression 
ee ms 
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procedures to the problems of judg- 
ment has been suggested by Bruns- 
wik (1947), and by Hammond (1955). 
Todd (1954) reports a study using 
regression coefficients and the multi- 
ple correlation coefficient for a de- 
scription of the clinical judgment 
process, where the task was to judge 
intelligence from a selected number 
of Rorschach signs. While such 
studies provide interesting implica- 
tions, it should be stressed that there 
are serious limitations with respect 
to theinterpretation of results; 
limitations which may be minimized 
or overcome only through a detailed 
examination of the rationale under- 
lying the model, and through re- 
formulations’ or.yrevisions of the 
model, should this be necessary. So 
as to insure the appropriateness of 
the linear model as a device for 
characterizing the judgment process, 
we consider in detail some of its 
properties, and provide the particu- 
lar reformulations where necessary. 

In the first place, and by virtue of 
the experimental control employed 
in the collection of the data, the 
only source of reliable judgment 
variance is from the information 
supplied. This is in objective form, 
e.g., it appears as a number, a desig- 
nated category, a position along a 
continuum, etc. Often these data ap- 
pear as test scores on a set of proto- 
cols being judged. Assuming that a 
judge combined the information in 
linear additive fashion, the multiple 
regression analysis will be quite ef- 
fective as a tool for describing the 
judgment process; i.e., the set of 
regression weights when applied to 
the corresponding predictors can 
quite properly serve as a model for 
judgment. Thus, the adequacy of 
the linear model can be assessed by 
inspection of the magnitude of mul- 
tiple R. If the judge integrates data 
in additive fashion as opposed to 
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configurational or pattern analysis, 
the linear multiple correlation will 
approach unity when corrected for 
attenuation. Lesser values of R sug- 
gest progressively lesser utility for 
the linear model. 

Secondly, it may be noted that 
the regression weights signify, with 
certain limitations, the emphasis or 
importance attached to each of the 
predictor variables by the judge. 
Large coefficients mean, empirically, 
that the corresponding predictors can 
account for large proportions of the 
variance of judgment; and a pre- 
dictor with a small beta coeffiicent 
contributes little beyond the con- 
tribution of other predictors. In 
practice, characterization of the 
judgment process by means of beta 
coefficients has three limitations: 
(a) since Js differ with respect to the 
size of their multiple R, direct com- 
parisons of sets of beta- coefficients 
between Js is not meaningful; (b) 
beta coefficients do not account for 
all the predictable variance; and (c) 
beta coefficients do not allow for the 
assessment of the independent con- 
tribution of each predictor. What 
would be more appropriate would be 
a set of weights which are compar- 
able from one J to the next, which 
are capable theoretically of account- 
ing for all of the predictable variance, 
and which carry exact interpreta- 


tion in terms of components of vari- 
ance. 


RELATIVE WEIGHTS 


The formulation that is required 
is fortunately not difficult. Beta 
weights (b.i) can be converted into 
a set of relative weights (Wi) which 
have all the advantages described, 
We show first that the variance of 
predicted scores (which in this Paper 
refers to predicted judgments) can 
be partitioned into two sources; one 
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a sum of squared beta coefficients, 
and the other a residual of weighted 
covariances. 
Let 
x’,=the predicted score for an in- 
dividual (protocol) in reduced 
standard form. 
x;=the standard score of the ith 
predictor (on the protocol). 
8;=the beta coefficient for the ith 
predictor. 


0 = Boti tH Bot "EES 
+Boits+ e- HBortk 
or 


k 
g= > Bois 


i=l 


et 
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The term in parenthesis is a 
weighted variancė-covariance ma- 
trix. It can be divided as follows: 


k 2 
[ > Bosta 
i 
k 


k k-1 
= LB ete + >i 2, BoiBojxix;(i> j) 


i=l jal 
The first quantity on the right, 


when squared and averaged over in- 


dividuals, yields the squares of the 
B coefficients. Thus: 


N k 
DY D Boia? 


i=l 


N 


k 

È B'as 

isl 

since the x; are standard scores, 
Similarly, the second quantity is 

a weighted sum of the intercorrela- 

tions among the Predictors. Thus: 


k 
= 8B "G50 = 
i=l 
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It follows that the variance of pre- 
dicted scores is described by a simple 
sum of squared beta coefficients if 
and only if the covariance terms 
vanish. One special case in which 
this will be true is that of orthogonal 
predictors. 

Relative weight, wo:, is defined as 


follows: First we note that 
VBorot borot + + + Bor Tor 

= Roar... k 
or 


— 
DX Boifoi= Ro... 


i=l 


Squaring both sides and dividing by 
R?, we get 


k Boifoi 


eer 1 
ar Ro... & 

t 
Therefore, in interpreting by inde- 
pendent components of variance, we 
express relative weight as 


Boifoi 
R?o.12- ok 


Woi 
where 
` B,;=the beta coefficient for 
the ith predictor 
Yor=the validity coefficient 
(correlation with judg- 
ment) of the ith pre- 
dictor 


’ 
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Ro1,2---e=the squared multiple 
correlation coefficient 
reflecting the best lin- 
ear combination of the 
k predictors in predic- 
tion of judgments. 

Finally, a description of the judg- 
ment process by means of linear re- 
gression procedures and relative 
weights allows one to go on to studies 
of varied sorts. Judges may be com- 
pared and contrasted with respect 
to their characteristic equations; and 
differences among judges may be re- 
lated to training, personality, and 
other factors that could conceivably 
affect the utilization of data. Many - 
other problems immediately suggest . 
themselves. 

The linear model may effectively 
be able to predict (or describe) clini- 
cal judgments to a very considerable 
degree, but there may be other situa- 
tions for which linear models are 
not appropriate—just as there must 
be many judges for whom more com- 
plex models are necessary. Let us 
now turn our attention to a second 
type of model. 


CONFIGURATIONAL MODELS 


In very general terms, the config- 
urational model can be described as 


J=f(X1, Xs, Xs, © + © Xx) 


wherein the exact functional rela- 
tionship involving the k ifidependent 
variables may be described in any. of 
a number of ways. As an example, let 
us consider a particular type of func- 
tion, one which shall be referred to 
as an interaction model. The interac- 
tidn model describes judgment as 
an appropriately weighted composite 
of all possible first order interactions 
of the predictors. Thus we may write 


k ka 


J=Aot 2) Dd) 4;X:X; 


i=l j=l 
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portant qualification is necessary. 
Even in the hypothetical situation 
in which prediction is perfect, one 
cannot conclude that the mental 
process has been “discovered.” By 
definition, of course, this point should 
be obvious, but it is well to point 
out that even among sets of mathe- 
matical relationships (models) which 
are ostensibly different, there may 
be some which are in fact equivalent 
with respect to explanatory power. 
An example may serve to clarify 
this point. Let us assume that for a 
given judge, and for two informa- 
tion variables, X and Y, the judg- 
ments can be independently pre- 
dicted from X and Y with 95% ac- 
curacy by the following equation: 


J'=+V X FY F2XY 


We note that the right hand term is 
simply the square root of the bi- 
nomial (X-+Y)2. Since X+Y 
=+VX?+ Y?+2XY, it follows that 
the equation J’=X+Y will account 


for the judgments equally as well as 
the expression 


J'=+V XF VIX 


It is therefore no more reasonable to 
conclude that the judge is in fact 

using one particular combination 
of the information than it is to con- 


clude that he is using the other. One 
would have t 


teria before 


t be a trouble- 
matical models 
vi 
whereby one set of aA piee 
ted from an- 
testable deriva- 
more complete 
ding of the phe- 


nomena. Such models therefore con- 


stitute a level of description and ex- 
planation which suffices for scientific 
purposes. It is not required of 
models that they bear any semblance 
of some ‘“‘actual” state of affairs, 
either within the organism or else- 
where, nor would this necessarily 
lead to a better understanding of 
nature. 

This may be more clearly seen in 
relation to an example from the phys- 
ical sciences. A chemist has the task 
of describing a substance. He per- 
forms a number of operations (or 
tests) on the substance and deter- 
mines its chemical composition. It 
turns out to be a relatively simple 
task. The substance is described, 
chemically, as CaCO;. But the work 
is not complete, for a different set of 
operations (or tests) might have pro- 
duced a different level of descrip- 
tion; one which may not be neces- 
sary for the chemist, but which is 
more suitable for other contexts. The 
mineralogist, for example, would per- 
form a series of tests to measure the 
optical properties of the crystal, or 
he might examine its hardness, solu- 
bility, and other characteristics. Two 
crystals identified by the chemist as 
CaCO; might in fact be somewhat 
different from one another to the 
mineralogist, one being aragonite, 
and the other calcite. These two 
crystals do in fact have the same 
chemical Structure, but they differ 
in molecular structure; this being re- 
vealed by optical and other tests. It 
1S apparent that different levels of 
description are possible with regard 
to substances, and that each has its 


peculiar advantages and shortcom- 
ings. 


In mineralogy, 
described as a pi 
nite. This word 
describe a subst 
line structural 


calcite is commonly 
aramorph of arago- 
is used generally to 
ance having crystal- 
Properties which dif- 
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fer from those of another substance 
with the identical chemical composi- 
tion.? 

We have borrowed the term para- 
morphic from mineralogy and em- 
ploy it in relation to representations 
of human judgment. The analogy 
may not be complete, but its limita- 
tions are not serious. The mathe- 
matical representation of the judg- 
ment process is a level of description 
that approaches the chemical de- 
scription of minerals. The formula 
helps to account for or “explain” 
what is observed concerning cer- 
tain properties or characteristics of 
the judge, just as the chemical 
formula ‘explains’ many, though 
not all, properties or characteristics 
of the substance. In addition, the 
- formulae are useful in making pre- 
dictions concerning the outcomes of 
certain other tests which may 
later be employed. But as with 
chemical analysis, the mathematical 
description of judgment is inevitably 
incomplete, for there are other prop- 
erties of judgment still undescribed, 
and it is not known how completely 
or how accurately the underlying 
process has been represented. The 
term “paramorphic representation,” 
used in relation to judgment, would 
seem adequately to indicate this 
state of affairs. 


Tuer LINEAR MODEL: REPRE- 
SENTATIVE RESULTS 


As illustrative of the significance 
of the methodology described above, 
four persons who participated as 
judges in studies at Oregon have 


2 More exactly, the crystal is called a para- 
morph when it is shown to be an alteration 
in crystal structure. In the example cited, 
aragonite may undergo change over a period 
of time, finally becoming calcite. It is the 
calcite that results from alteration of aragonite 
which is paramorphic. 
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Fic. 1, A sample profile for the judgment of 
intelligence. 


been selected. Two of these made 
judgments of “intelligence” of 100 
persons on the basis of a set of nine 
predictors or sources of information. 
The remaining two made judgments 
of “sociability” of 150 persons on the 
basis of profiles containing scores on 
eight selected Edwards Personal 
Preference Schedule (EPPS) vari- 
ables. In all cases, the judges re- 
turned after an intervening period of 
several days and made a second set 
of judgments on the same profiles. 
Sample profiles are shown in Fig. 1 
and Fig. 2. 

Application of standard multiple 
regression procedures reveals that a 
best linear combination of the pre- 
dictor scores correlates .948 and .829 
with the judgments of Judges 15 
and 18, respectively. But to what 
extent can it be said that the linear 
model adequately characterizes the 
judgment process for these judges? 
The reliability of judgment for these 
two is .876 and .836 respectively. 
Correction for attenuation results in 


3 Corrected for shrinkage. Multiple Rs 
computed from a cross-validation sample of 
protocols were .937 and .837 for these Js re- 
spectively. 
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Fig. 2. A sample profile for the judgment of 
sociability, 


coefficients which are 1.00+ and 
907. Thus it may be seen, in the 


ual or error 


in comparison 
with that which is Predictable from 


the model. For the second judge the 
case is somewhat different. When 
unreliability of judgment is taken 
into account, there still 
17.7% of variance which is unac- 


It follows, 


therefore that the linear model, 
while quite sufficient for the first 
judge, 


is less appropriate for the 


ably unanswerable, but it is quite 
proper to ask a related one. Con- 
cerning the method of utili 

information, what corre: 
ists between the verb: 


offered by the judge and the descrip- 


tion achieved by the multiple regres- 
sion model? 

There are some difficulties in at- 
tempting to ascertain, from the state- 
ments of the judge, a subjective im- 
pression of a cognitive process, In 
instances in which persons are asked 
to make judgments of intelligence 
and sociability from the sets of in- 
formation that have just previously 
been described, it is rarely true that 
the judge has a high degree of con- 

dence in statements he may make 
concerning the relative importance of 
the predictor variables. In some in- 
stances difficulties of communication 
emerge; when, for example, a judge 
finds it necessary to relate some 
rather complex or configurational 
analysis that he feels best describes 

is own “method of combination.” 
One alternative is to ask the judge to 
distribute 100 points among the 
Sources of information available and 
in such a way that this distribution 
reflects, to the best of his knowledge, 
the relative importance of those vari- 
ables. 

Such a task is easily understood by 


the Judge. The method has the addj- 


tonal advantage of insuring that his 


Person, Pr 
tion is lost 


to be adequately describable by a 
weighted sum of the information, 


he number of Points assigned in 
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Fig. 3. Comparison of relative and subjec- 


tive weights in the judgment of intelligence 
(Judge 15). 


this way to each of the information 
variables will be referred to as the 
subjective weight (soi). Comparisons 
of subjective and relative weights 
are shown for the two judges of in- 
telligence in Figs. 3 and 4. With re- 
spect to Judge 18, there is a high de- 
gree of agreement of relative and 
subjective weights. For Judge 15, 
however, there are greater discrep- 
ancies, disagreement being most pro- 
nounced in Variables 1, 4, and 7. 
These judges differ in the extent to 
which they are capable of assigning 
a set of numbers to the sources of 
information used in judgment so as 
to approximate the relative weights 
as determined by the linear model. 
Results from a relatively large sample 
of judges will be reported in a forth- 
coming article. 
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Fic. 4. Comparison of relative and subjec- 
tive weights in the judgment of intelligence 
(Judge 18). i 


The two examples from the socia- 
bility experiment can be used to 
illustrate the same phenomena. The 
relevant information is described in 
Table 1 and in Figs. 5 and 6. 


A CONFIGURATIONAL MODEL: 
REPRESENTATIVE RESULTS 


Data for one type of configura- 
tional model comes from a study by 
Martin (1957). On the basis of 
lengthy and rather complete inter- 
views, Martin was able to obtain 
fairly clear statements from a set of 
five counselling psychologists. These 
statements expressed the manner in 
which the psychologists believed 
they were utilizing the set of eight 
EPPS variables in the prediction of 
sociability. Of the five judges, Psy- 
chologist D has been selected for 


TABLE 1 


Goopness or FIT OF LINEAR MODEL FOR JUDGMENTS OF INTELLIGENCE & SOCIABILITY 


Attenuated 


fii Re R Nonlinearity* 
à Judge 18 .876 948 1.00+ = 
Intelligence Judge 18 836 "829 907 17.7 
ability Judge 3 -830 -901 -989 2.2 
mociability Judge 6 830 770 ‘845 28.6 


a % of variance unpredictable from the linear model. 
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Fic. 5. Comparison of relative and subjec- 
tive weights in the judgment of sociability 
(Judge 3). 


illustrative Purposes. His verbaliza- 
tions with respect to the judgment of 
sociability follow: 


As might be expected I generally look for 
some over-all patterning of the test variables. 
Although I do hold in mind certain standards 
and/or tendencies. In the following discus- 
sion when I refer to high I mean at least 1 SD 
above the mean. For a rating of high Soci- 
ability, i.e., 7, 8, or 9, I would generally expect 
at least two or three scales (Exh.-+Dom, 
+Het.) to meet the criterion of 1SD above 
the mean with others at least in the middle 
range or pointing in the direction of high. I 
would also expect for this rating that Aba. be 
in the average range or in the direction of low. 
The more that Exh.+Dom.+Het. approach 
the high extreme and Aba. the low extreme the 
more apt I am to make a higher rating. In 
this concept the other four scales act in pairs, 


If both are ex- 
to the rating and 


cf y tend to detract 
from the rating of high Sociability, 


high rating are met I 
would suspect the reliability of 


In the case of a low rating (1, 2, 3) I would 
generally expect a some 

ing. For example, here 
to be quite high with pair Suc. and Def. in the 
middle range or pointing in the direction of 
high. Similarly, I would expect at least two of 
the three scores (Dom., Exh., and Het.) to be 
in the middle or low range with none of them 
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extremely high. The higher Aba. along with 
the pair Def. and Suc. and the lower the 
scales Dom., Het., and Exh., the lower the 
rating. The pair Chg. and Aff. again have 
little effect unless they are significantly low 
and then they tend to support or add to a 
low rating. 

In rating within the average range I look 
for the significant scales Exh., Dom., Het., 
and Aba., not be extreme in either direction. 
While the variables are considered in relation 
to one another (a high score on Exh. offsets a 
high score on Aba.) they contribute to my final 
rating whether singly or in pairs. 


From the foregoing description, 
it may be argued that the set of vari- 
ables underlying Ds judgments must 


take into account the following con- 
siderations: 


1. Interaction. Certain of Ds statements 
imply an interactive or multiplicative re- 
lationship between two Predictors and 
the judgment criterion. A good example 
of this is the statement, “The scales Chg. 
and Aff. are also considered as a pair but 
add or detract nothing to the rating un- 
less both are quite high or quite low.” 

2. Nonlinearity. Since D has stated that 
variables are of most importance when 
their values are high, and that values 
between +1¢ of the mean are usually 
ignored, an exponential function should 


be more predictive of judgment than a 
linear one, 


As a result, the following predictors are 
defined: 


Xı Abasement 

X2 Exhibitionism 
X: Heterosexuality 
10 
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3 SUBJECTIVE 


DOM ABA CHA OW ET 


_ Fic. 6, 
Jective wei 
Qudge 6). 


Comparison of relative and sub- 
ights in the judgment of sociability 


yy 
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X, Dominance 
Xs Change XAfiliation 
Xes Succorance X Deference 


Each of the six predictors can then 
be subjected to a nonlinear trans- 
formation, the result of which is to 
correct for the stated tendency of 
the judge to discount predictor scores 
near the mean of the distribution, 
and to emphasize them increasingly 
as they become extreme, up to a 
limit. The resulting variables are 
then weighted according to least 
squares procedures. 

As in the case of the linear model, 
it is possible to compute relative 
weights and to compare these with 
the weights assigned subjectively. 
This comparison is shown in Fig. 1A 
We shall discuss the data shortly. 

What classes of problems can be 
attacked through the technique of 
the configurational model? As in the 
case of the linear model, it furnishes 
a description of the relative im- 
portance (to the judge) of the vari- 
ous sources of information available. 
But greater latitude is possible. Con- 
figurational models are capable of 
handling the complexities and pat- 
terns believed by many to be an es- 
sential (if not “natural”) part of the 
judgment process. Thus, one may 
ask whether a graduate course in 
psychodiagnostics or in personality 
assessment was effective in produc- 
ing students who are “‘configura- 
tional” in their interpretation of 
case material. 

A second class of problem is that 
of individual differences. Do persons 
differ in the type of model which 
most appropriately accounts for their 
judgments? If so, in what respects? 
And what proportion of these dif- 
ferences can be attributed to specific 
training? To personality? To intel- 
lectual characteristics? 

There is a third class of problem, 
one which is at least of equal signifi- 
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SUBJECTIVE 


Che. En; Sys Het: Dom. Aba, 
AR Det. 


Fic. 7. Comparison of relative and subjec- 
tive weights, using a nonlinear model. 
Clinician D (From Martin, 1957). 


cance to the others mentioned. This 
concerns the stability of the judg- 
ment process. Can a “linear” judge 
be taught to be more complex? Can 
any judge be taught to make more 
efficient and more accurate use of in- 
formation? Will this generalize or 
transfer to other judgment situa- 
tions? What are the personality 
characteristics which differentiate the 
flexible judge from the unchangeable 
one? Some of these problems are 
currently under investigation and 
results will appear in forthcoming 
publications. 

Finally, it may already have be- 
come apparent that the issue of 
nomothetic vs. idiographic approaches, 
first proposed by Windelband (1904) 
and discussed more thoroughly else- 
where (Allport, 1937; Sarbin, 1944) 
may be approached in one of its 
major aspects through the use of 
judgment models. If the argument 
is confined to that of “method of 
combination,” as suggested by Meehl 
(1954), representational models are 
capable of providing some strik- 
ingly clear evidence. Martin (1957) 
presents some interesting findings 
in this respect, and it may be well 
for illustrative purposes to return to 


130 PAUL J. HOFFMAN 


Psychologist D, discussed above, 
and ask what the data suggest. 

The first thing that might be said 
with respect to this Psychologist is 
that, after having defined the pre- 
dictors to his liking, he is not as 
astute as he might be in attaching 
subjective weights to them. By this 
it is meant that the discrepancies be- 
tween subjective and relative weights 
are by no means insignificant in an 
absolute sense. The matter can be 
made even clearer by stating it an- 
other way: Given the variables in 
question, a computing machine would 
come closer to reproducing Ds judg- 


of particular importance is the fact 
that both of the interaction varia- 
bles were Overevaluated in this re- 
spect. Perhaps one likes to think of 
himself as being more complex than 
he actually is. 

Since the computer does as well as 
it does in reproducing the judgments, 
an example is here provided wherein 
a single consistent set of rules of 
combination (nomothetic) may be 
successfully applied to the many 
and varied cases, The resulting 
judgments may well 
“configural” 


» Particular] 


is more to illustrate the app 
of a methodology th 
reforms in clinical p 
evidence based on a sample of Size 1, 


What, additionally, may be said 
of the configurational model de- 
veloped for D? We may legitimately 
ask to what extent the substitution 
of complex variables in the place of 
simple ones effectively enhanced the 
Prediction of judgments, thereby 
testing the relative efficacy of two 
models. The R (corrected for shrink- 
age), using the configurational model, 
is .88. This appears high, and it may 
well be, except for the fact that the 
application of a linear model to D’s 
judgments results in a corrected R 
of .94, Thus, substituting a config- 
urational model for a linear one 
changes the Proportion of predicted 
variance of judgments from 82.81% 
to 77.44%; a loss of a little better 
than 5%! And considering chance 
factors, the least that can be said is 


that there is no demonstrable gain 
over the linear model, 


SUMMARY 


: Paper has been concerned 
with the Mann 


It is shown 
ematical models provide a 
Way of describing mental processes 
Which would otherwise be accessible 
only through introspection or electro- 
Physiological techniques. A linear 
model and a configurational model 
are described, and illustrations furn- 


ished for each. Such models make 
Possible the tes 


Concerning m 


ae ences in judgment 
ability, effects of training, person- 


ality Correlates, idiographic inter- 
Pretation of ¢ 


PARAMORPHIC REPRESENTATION 


cision making can illuminate prob- 
lems which would otherwise remain 
obscure. In focusing upon the indi- 
vidual as the unit of research while 
at the same time preserving method- 
ological rigor it becomes possible to 
achieve a level of psychological de- 
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scription which would otherwise be 
quite difficult. And few would dis- 
agree with the suggestion that sound 
description of the decision process 
is quite fundamental to a complete 
understanding of man. 
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THE WECHSLER INTELLIGENCE SCALE FOR CHILDREN! 


REVIEW OF A DECADE OF RESEARCH 


WILLIAM M. LITTELL 
Claremont Graduate School 


In the 10 years since its publica- 
tion the Wechsler Intelligence Scale 
for Children (WISC) (Wechsler, 
1949) has found wide acceptance 
among psychologists working with 
children in schools, clinics, and hos- 
Pitals. With the wide use of the 
WISC, not only as a measure of in- 
telligence but often as a clinical diag- 
nostic instrument, it seems advisable 
to take a careful look at the growing 
fund of literature concerning the 
WISC, its validity in its various uses, 
and its general characteristics as a 
measuring device, 


Tue WISC 


The WISC was developed as a 
downward extension of the Wechsler- 
Bellevue Intelligence Scales (W-B) 
(Wechsler, 1944), and most of the 
items contained in the WISC are 
from Form II of the adult scales 
(Wechsler, 1949; Seashore, Wesman 
& Doppolt, 1950). Easier items have 
been added to the low end of the sub- 
tests to make it suitable for use with 


hension, Arithmetic, Similarities, Vo- 
cabulary, and Digit Span), and a 
Performance Scale ( 
tion, Picture Arrangement, 
Design, Object Assembly, Coding 
and Mazes). In the standardization 
of the WISC all 12 subtests were ad- 
ministered; only 10, however, were 


1 The author wishes to express his apprecia- 
tion to Robert Allen Keith for his many sug- 
gestions and comments through the several 
drafts of this paper. 
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used to establish the IQ tables. Digit 
Span and Mazes were omitted ‘‘pri- 
marily [because of] their relatively 
low correlation with the other [sub] 
tests of the Scale and also, in the case 
of Mazes, the time factor” (Wechsler, 
1949, p. 6). Wechsler suggests that 
all 12 subtests be given whenever 
Possible “because of the qualitative 
and diagnostic data they add” 
(Wechsler, 1949, p. 6). When 11 or 
12 subtests are used, Prorating is 
necessary. 

The WISC was standardized on 
2200 white American 


chosen to be representative of the 


t’s occupation, 


and geographic area. Some adjust- 


ment was made to allow for the “re- 
cent shift of Population to the West.” 
One hundred boys and 100 girls were 
chosen at each of 11 age levels, ages 
5 through 15, Except for the 55 
mentally deficient children included 
in the sample, all children were within 
one and one-half months of their mid- 
years. The mentally deficient group 
was drawn Primarily from institu- 
tions in Illinois, Michigan, and New 
York, and more lenient age standards 
were observed. The standardization 
tests were administered by 17 field 
examiners who worked in 85 different 
communities, 

The WISC IQ’s (Verbal, Per- 
formance, and Full Scale) are devia- 
tion scores based on norms from 
other children of the same age. The 
Taw scores obtained from subtests 
are transmuted into Scaled Scores by 
Separate tables for each four-month 
age span (e.g., 5-0 to 5-3) and then 
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into IQ’s with a mean of 100 and a 
standard deviation of 15. 

Early reviews of the WISC. Eatly 
reviews of the WISC varied from 
somewhat qualified acceptance 
(Delp, 1953b; McCandless, 1953) 
and the prediction of wide usage 
(Shaffer, 1949), to a rather critical 
rejection of the WISC in favor of the 
S-B (Anderson, 1953). All of the re- 
views were favorably impressed by 
the care taken in the standardization. 
Several specific criticisms were men- 
tioned, however: the WISC manual 
lacks any evidence for its over-all 
validity (Delp, 1953b; McCandless, 
1953; Shaffer, 1949); it provides a 
temptation to do elaborate pattern 
analyses on scores (McCandless, 
1953) without providing substantia- 
tion for any interpretive value (Delp, 
1953b; McCandless, 1953; Shaffer, 
1949); it does not provide for ex- 
tremely high (above 155) or ex- 
tremely low (below 45) scores (Delp, 
1953b); and the subtests appear to 
be too difficult for very young chil- 
dren (Delp, 1953b). The mental age 
was missed by Anderson (1953), Delp 
(1953b), and Shaffer (1949), and the 
lack of Negro children in the stand- 
ardization group was mentioned as a 
weakness by McCandless (1953). 
Delp (1953b) felt further that the 
scoring of certain verbal items in- 
cluded considerable subjectivity. An- 
derson’s rather critical review (1953) 
mentioned the fact that raw scores of 
zero are given scaled scores above 
zero for the younger children. Ander- 
son was, in fact, able to find several 
apparent discrepancies in Wechsler’s 
statistical treatment of the standard- 
ization data. 

Strong points brought out by the 
reviewers of the WISC were its up- 
to-date construction (Delp, 1953b; 
McCandless, 1953) and its standard- 
ization (Anderson, 1953; Delp, 1953b; 
McCandless, 1953; Shaffer, 1949). 


Mentioned also as strong points were 
the facts that all of the children are 
administered comparable batteries 
(McCandless, 1953); the time of ad- 
ministration appears to be shorter 
(Delp, 1953b) and more predictable 
(McCandless, 1953) than comparable 
tests; it is easy to administer, in- 
teresting to children, gives both a 
Verbal and Performance IQ, provides 
IQ's directly comparable for various 
ages, appears to have potential clini- 
cal use, and has an easily used man- 
ual (Delp, 1953b). 

A framework for evaluation, A 
word should be said concerning the 
framework within which this evalua- 
tion of the WISC is conducted. In 
addition to a number of articles re- 
porting research on or with the 
WISC, the last 10 years have also 
shown advances in the methodology 
of psychological measurement and 
theory construction. (See especially 
Coombs, 1951; Coombs, Raiffa, & 
Thrall, 1954; American Psychological 
Association, 1954; Cronbach & 
Meehl, 1955.) The psychological test 
has come to be seen as only one ele- 
ment in the total process of theory 
construction. The full value of the 
test as a measure of a psychological 
variable depends upon how well the 
entire system in which it is used 
stands up to both logical and experi- 
mental test. 

This view of the over-all “validity” 
of a test demands that (a) the area of 
the object world to be covered, (b) 
the nomothetic network containing 
the variable, and (c) the steps by 
which the test is demonstrated to be 
a measure of the variable, be made 
public, and that all assertions be 
subjected to empirical test. 

This article is a review of the lit- 
erature concerning the WISC since 
its publication in 1949. Its purposes 
are twofold: (a) to evaluate the 
WISC as a measure of various psy- 
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chological variables, and (b) to bring 
together for the user of the WISC in. 
formation provided by the past dec- 
ade of research. 


Tue WISC as A MEASURE OF 
INTELLIGENCE 
Content Validity 


In practice, a great deal of weight 
is often given to the user’s assessment 
of the content validity of the WISC. 
The actual assessment is not simple, 
however, and is complicated by the 
similarity both in form and content 
between the WISC and the adult 
scales. As noted by others (Delp, 
1953b; Shaffer, 1949), it tends to have 
attributed to it the validity of these 
other scales, 


measure a 
demonstrated oe 


sample of a uni- 


or exploration, 
c criterion has 
consistently led to misinformation, 


To assess the content validity of 
the WISC, a universe of items must 
be defined which is relevant to 
Wechsler’s concept of children’s in- 
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telligence. Unfortunately, beyond a 
few general remarks (Wechsler: 1949, 
1950; Wechsler & Weider, 1953), no 
theoretical discussion of the concept 
of intelligence as it applies to children 
exists in print. To proceed, the as- 
sumption must be made that, at least 
in its more general aspects, the dis- 
cussion of adult intelligence (Wechs- 
ler: 1944, 1958) is applicable to chil- 
dren. 

Wechsler’s definition of intelligence 
is very broad. As far as the trait 
“general intelligence” js concerned, 
any item which is judged to tap a 
child’s “aggregate, or global capacity 
to act purposefully, to think ration- 
ally, and to deal effectively with his 
environment” (Wechsler, 1944, p. 3) 
might be included as a potential test 
item. Defined only at this rather 
gross level, it is difficult to conceive 
of any measure of directed behavior 
which would be definitely excluded. 

A further assumption is made that 
a child’s response to any intellective 
task is affected not only by his gen- 
eral intelligence, but by other “non- 
intellective” factors such as “drive” 
and “hand-eye Coordination.” While 
Wechsler Presents rather convincing 
arguments for including such factors, 
the discussion of this universe is lim- 
ited to a few examples. Wechsler 
he controlled for the dif- 
cts of these “nonintellec- 
1 s by including a wide va- 
riety of types of items (Wechsler, 
1944), 

As the test is constructed, two sep- 
arate questions appear to be in- 
volved: (a) the sampling of the uni- 


combinations of subtests), and (b) 
i ems within each 
By looking at explicitly 
stated theory, there seems to be no 
way in which the adequacy of the 
sampling of “nonintellective” factors 
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can be ascertained, for no statements 
are made to limit the possible range 
of factors. 

On a less formal level, however, 
there seem to be several factors often 
included in any “common” concept 
of intelligence, but not adequately 
represented in the test. While these 
“omissions” would be of little conse- 
quence if the WISC were demon- 
strated to have the desired predictive 
validity, they might provide fruitful 
hypotheses if such validity is found 
to be lacking in any particular situa- 
tion. Which test items, for instance, 
call for the integration of newly 
learned material into old contexts or 
for the memory of meaningful ma- 
terial? Further, the nature of the 
test situation rules out problem solv- 
ing which takes place outside of a 
one-to-one relationship with another 
person or which involves any but 
very short periods of time. 

The degree to which the items in- 
cluded within a given type represent 
an adequate sample for any particu- 
lar child is a problem common to all 
intelligence tests, and presents an- 
other large source of question for the 
WISC. It is obvious, for example, 
that the degree to which a child 
would have a chance to learn the 
answer to ‘who wrote Romeo and 
Juliet?” or even “what is the color of 
rubies?” would differ markedly from 
one subculture to another. Yet suc- 
cess or failure on these items con- 
tributes equally to the IQ score no 
matter what the background of the 
child might have been. This type of 
criticism could also apply to subtests 
calling for specific skills such as put- 
ting puzzles together or manipulat- 
ing a pencil. 

In summary, the WISC appears to 
lack any explicitly stated, organized 
network of intuitive reasons for ex- 
pecting it to show predictive validity 
other than the very broad assump- 
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tion of a general factor which enters 
into the purposeful solution of all 
problems—whether they occur in a test 
or in the child’s life. While Wechsler 
speaks convincingly of other, non- 
intellective factors which enter sig- 
nificantly into a child’s actual behav- 
ior in problem situations, there ap- 
pears to be little evidence that these 
factors are sampledyin any system- 
atic manner. This forces the user 
of the WISC to depend very heavily 
on whatever demonstrated criterion 
oriented and construct validity the 
WISC might have. 


Predictive Validity 


If the use of the term predictive 
validity is restricted to correlations 
between the WISC and some nontest 
measure of predicted behavior ob- 
tained at some time subsequent to 
the administration of the WISC, 
there are no relevant studies in the 
literature reviewed. This is very sur- 
prising, as it is difficult to conceive 
of any situation in which the WISC 
might be used that would not involve 
the prediction of behavior. As it 
stands, this lack of explicit evidence 
of the value of the WISC in the pre- 
diction of subsequent behavior must 
be viewed as a major weakness of the 
test. 


Concurrent Validity 


In general, reports of the concur- 
rent validity of the WISC have been 
restricted to correlations between the 
WISC and other test measures of 
achievement or intelligence. Most 
of the studies relating the WISC to 
other intelligence tests have been 
oriented toward assessing the com- 
parability of the various IQ scores. 

Stanford-Binet. Studies reporting 
the comparability of WISC scores 
with S-B scores on different popula- 
tions of children began to appear soon 
after the publication of the WISC. 
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A summary of these correlations ap- 
pears in Table 1. 

Frandsen and Higginson (1951) 
reported a study on 54 fourth-grade 
children and concluded that “IO 
norms from the S-B and WISC are 
comparable at least within the range 
of one to two sigmas above and below 
the mean” (p. 283). This is the most 
favorable and unqualified statement 
of the comparability of the WISC 
and S-B appearing in the literature. 
An article by Pastovic and Guthrie 


(1951) followed summarizing the re- 
sults of five unpublished master’s 
theses. They concluded “that the 
WISC IQ should not be interpreted 
as equivalent to a Binet IQ at age 
levels below 10 years of age since the 
WISC score is consistently lower 
than that of the Binet” (p. 385). 
Krugman, Justman, Wrightstone 
and Krugman (1951) found signifi- 
cant differences between the WISC 
Full Scale and Performance Scale 
IQs and the S-B IQ at all age levels 


TABLE 1 


STUDIES REPORTING CORRELATIONS BETWEEN THE WISC AND STAN 


N 
Author Subjects Age Range 
Boys Girls x P 
ee and Higginson 4th grade children 54 9-1 to 10-3 “71 +63 +76 
Krugman et al. (1951) New York school children 332 739 | 644 817 
166 166 
Se : = 
Nale (1951) Mental defective children 104 
909 
54 50 
a 5 ~ — 
Sloan and Schneider (1951) | Mental defective children z 40 a -751 | .641 | .493 
Stacey and Levin (1951) Mental defective children 72 69 ce 
Weider et al. (1951 hit isvi ildren— a ol cease ee 
eider et al, (1951) nea 3 ile children a EA 5-0 to 7-11 <2 | | .90 
——] 
62 8- ja 9 
36 54 0 to 11-11 +92 |.78 | .8 
Total 5-0 to11-11 | 39 | 77 | 80 
Pastovic and Guthrie (1951)b| 2nd grade children 50 7-6 (82 71 88 
Kindergarten children 50 5-6 63 | .56 71 
Clarke (1950)b Sth grade children 85 11-1 83 57 79 
Rapaport Public school children 100 7-6 79 | .74 | .85 
Cohen and Collier (1952) Local Bloomington school 51 Mean 7-5 .82 -80 +85 
Museen et al, Gasy A "highly select population” 39 6-0 to 13-1 -83 +72 +85 
Se raacock and Butler Mental defective children ee ae 10 to 16 -80 | .66 | .76 
Tri 
serene Catee'(1953) 6 Mean SH IQ 124.11 46 S-year-olds | 1578 | 478 | .615 
Arnold and Wagner (1955) Elementary school children 50 8- & 9-year-olds| 88 +74 | .90 
Gehman and Matyas (1956) School children 60 Mean 11-1 78 | .46 | .73 
Same group—4 years later | 29 31 Mean 15-2 -76 | .64 | .77 
Stroud et al. (1957) Children, Gr, 
erred to sone i Sa, Te- 621 «87 83 94 
mean IQ dull normaj 
Schachter and Apgar (1958) | S-B administered 50 8 mo, 
. 5 113 49.4 Mo,-S-B 
before WISC $1 52 [100.2 Mo-Wisc| sa | ag | 67 


a V =Verbal Scale; P =Performance Scale: FS=Full 


e; FS 
b Studies summarized by Pastovic and Guthrie (1951-2 


° Study conducted using Form M., 


rates 
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(5-15), which were consistently in 
favor of the S-B. Differences between 
the S-B and WISC Verbal Scale 
tended to be significant only at 
younger age levels. They concluded 
further that ‘‘there is a definite tend- 
ency for greater differences... to 
be associated with the higher Stan- 
ford-Binet IQ’s,”’ and that differences 
between S-B and WISC Verbal and 
Full Scale IQs “tend to be associated 
with chronological age, in that such 
differences are larger at younger age 
levels” (p. 482). 

It should be noted that a child can- 
not obtain an IQ above 154 on the 
WISC without extrapolation beyond 
the norms, while the S-B would allow 
much higher scores. This fact may 
explain in part the finding that the 
greater differences were associated 
with the higher S-B IQs. 

Weider, Noller, and Schraumm 
(1951) also found that while the S-B 
and WISC IQs are significantly cor- 
related, “the Binet IQ’s tend to be 
higher than the WISC IQ’s for the 
same children” (p. 332). A regression 
equation was suggested relating S-B 
to WISC Full Scale IQs in which 
WISC equals 0.85 Binet plus 11. Ac- 
cording to this formula, when S-B 
IQs are below 73, the WISC IQs 
would be higher than the S-B IQs. 

Cohen and Collier (1952), Mus- 
sen, Dean and Rosenberg (1952), and 
Stroud, Blommers and Lauber (1957) 
also reported correlations between 
the S-B and WISC. Further evidence 
that the WISC tends to score chil- 
dren within normal and upper ranges 
lower than the S-B is presented by 
Kureth, Muhr and Weisgerber (1952) 
in their study of 100 five- and six- 
year-old children, and by Levinson 
(1959) in his study of 117 Jewish pre- 
school children. Triggs and Cartee 
(1953) tested 46 rather select children 
in the kindergarten of an independent 
school (S-B mean IQ of 124.11), and 


found WISC IQs to be consistently 
lower (Full Scale mean of 107.58). 
They concluded further that “there 
is a marked tendency for larger dif- 
ferences between Stanford-Binet and 
WISC IQ's to be related to higher 
Stanford-Binet IQ's” (p. 29). 

Arnold and Wagner (1955) ex- 
amined 50 children drawn at random 
from elementary schools and con- 
cluded that “‘so far as this sample is 
concerned, the relationship between 
1Q’s obtained for eight- and nine- 
year-olds with the WISC (Full Scale) 
and Form L Binet is not significantly 
different from the relationship be- 
tween IQ’s obtained on Forms L and 
M of the Binet” (p. 93). The Verbal 
Scale related significantly better with 
the Binet than did the Performance 
Scale. 

Preschool S-B IQs were compared 
with the school-age WISC IQs of 113 
children selected at random from a 
clinic population born at a women’s 
hospital (Schachter & Apgar, 1958). 
Of the 404 children requested by mail 
to return for testing, 119 returned for 
both tests; six were eliminated for 
other reasons. The resulting correla- 
tion of .67 (see Table 1) between the 
S-B and WISC Full Scale IQs was 
reported to compare favorably with 
previously reported correlations be- 
tween preschool and school-age S-B 
IQs. 

The comparability of IQ scores of 
the WISC and S-B when applied to 
mentally defective children has been 
investigated by several authors. Nale 
(1951) found the rather high correla- 
tion of .909 between the WISC Full 
Scale and the S-B, Form L, for 104 
defective children, while Stacey and 
Levin (1951) and Sloan and Schneider 
(1951) report correlations of .68 and 
-493 respectively. In general, the 
WISC Full Scale was found to score 
somewhat higher than the S-B for 
these defective children. 
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Sandercock and Butler (1952) com- 
pared the WISC and S-B, Form M, 
IQs of 90 mentally defective chil- 
dren and concluded that ‘correla- 
tions obtained between the Stanford- 
Binet (M) and the three WISC IQs 
indicate a high degree of relationship 
between the Binet and WISC Ver- 
bal” (p. 104). 

Several of the conclusions and as- 
sumptions made by various authors 
were subjected to direct test by Hol- 
land (1953) who found in part: (a) 
There was no significant practice ef- 
fect on the WISC IQs when the S-B 
was given first and the median inter- 
val between the tests was seven days. 
(b) There was a significant difference 
between the correlations of the S-B 
with the Performance and with the 
Verbal and Full Scales of the WISC 
(in favor of the Verbal and Full 
Scales). (c) There was no significant 
difference between the correlations of 
the S-B with the Verbal and Full 
Scales of the WISC. (d) There was 
no significant relationship between 
chronological age and the difference 
between S-B and WISC IQs. (e) 
There was no significant relationship 
between S-B IQ and the difference 
between S-B and WISC IQs. 

In general, the following conclu- 
sions can be drawn from these data 
about the comparability of the WISC 
and S-B IQs, 


1. Studies involving a variety of 
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ages and IQ ranges are very con- 
sistent in showing that at least within 
a white American school population 
the WISC and Stanford-Binet scores 
are related to a significant degree. 
Correlations between the WISC Full 
Scale and the S-B are predominantly 
reported within the .80's, 

2. The WISC scores tend to be 
lower than S-B scores for the same 
children within the middle and upper 
ranges and somewhat higher for de- 
fectives. This appears to be particu- 
larly true for younger children (below 
10) and for the higher S-B scores. 

3. Using the S-B asa criterion, the 
highest correlations are found with 
the Full Scale IQ scores, the next 
highest with Verbal, and lowest with 
Performance scores, 

Wechsler-Bellevue. The fact that 
the Wechsler-Bellevue (W-B) and the 
WISC overlap for the years 10 
through 15 has led to several studies 
investigating the comparability of 
the WISC and W-B scores. The cor- 
relations reported are summarized in 
Table 2. 

Knopf, Murfett, 
(1954), feeling that the many sim- 
ilarities between the WISC and the 

-B may suggest a comparability 
which is not actually there, admin- 
istered the W-B and WISC to 30 Jun- 
ior High School boys. They found 
that, while the WISC and W-B scores 
are highly correlated, the Verbal and 


and Milstein 


TABLE 2 


STUDIES REPORTING CorRELATIO 


i NS BETWEEN THE WISC AND THE 
WEcHSLER-BELLEVuE, Fi 


ORM 1 

jons® 

Author Subjects N Ake Range Correlations! 
Boys Girls v P | FS 
Delattre and Cole (1952) Public school children 50 10-5 to 15-7 86 82 | .87 
Vanderhost et al. (1953) Mental defective children 38 11 to 16 54 77 72 

22 16 

Knopf et al. (1954) Jr. high school boys 30 13-4 to 14-6 -83 64. | .87 


£ V=Verbal Scale; P =Performance Scale; FS=Full Scale, 
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Full Scale scores on the WISC are 
significantly higher (at the .01 level). 
The Performance Scales, on the other 
hand, were not significantly different. 

Price and Thorne (1955), testing 
two groups of white American public 
school children, found that at both 
the 114- and 14-year levels the 
WISC Full Scale and Verbal Scale 
IQ means tended to be higher than 
the corresponding W-B means, and 
that the direction of this difference 
was reversed for the Performance 
Scales. The authors set up criteria 
that two tests should be judged 
equivalent if, allowing for chance 
variation, (a) the individual should 
obtain essentially the same ranking 
on both tests, and (b) he should ob- 
tain essentially the same scores. By 
these criteria, at the 11}-year level 
the Verbal Scales were found to be 
lacking on both (a) and (b); the Per- 
formance Scales were found to be 
lacking on (b) and the Full Scales 
were remiss on neither. At the 143- 
year level the Verbal and Full Scales 
were lacking on (b) and the Per- 
formance on (a). 

Using as Ss a group of 38 high- 
grade and borderline mental defec- 
tives, Vanderhost, Sloan and Bens- 
berg (1953) also found the WISC 
Verbal Scale to score significantly 
higher than the W-B Verbal Scale, 
while no significant difference was 
found in Performance Scales. They 
concluded that because of this tend- 
ency for the W-B Verbal Scale to 
score significantly lower than the 
WISC Verbal Scale, the WISC is the 
preferred test to use on mental defec- 
tives in the 10- to 16-year range. 

The following conclusions may be 
drawn about the comparability of the 
WISC and Wechsler-Bellevue 1n the 
age range over which they overlap. 

1. The two scales appear to be re- 
lated to a significant degree. Full 
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Scale correlations are reported in the 
.70’s and .80’s. 

2. The W-B Verbal Scale scores 
tend to be significantly lower than 
the WISC Verbal Scale scores for the 
same child. It may well be that the 
WISC items are more appropriate at 
this age level. 

Other individual intelligence tests. 
In the following studies the WISC 
has often been used as the criterion 
against which the other test is vali- 
dated. The results of these studies 
are reported in Table 3. 

Three studies (Cohen & Collier, 
1952; Pastovic & Guthrie, 1951; 
Sloan & Schneider, 1951) are reported 
in which the WISC has been corre- 
lated with the Arthur (see Table 3). 
The Arthur, as might be expected, 
appears to correlate better with the 
WISC Performance Scale than with 
the Verbal Scale. 

Because of the length of time 
needed to administer the WISC, 
Martin and Wiechers (1954) investi- 
gated the possibility that the Colored 
Progressive Matrices could be used as 
a measure of intelligence with greater 
brevity than the WISC and a similar 
degree of validity. One hundred nine- 
year-old children from four Indiana 
schools were given the Matrices and 
the WISC in counterbalanced order. 
The authors concluded that “‘in view 
of these high correlations (see Table 
3) and the ease and speed of admin- 
istration it would seem that the Col- 
ored Progressive Matrices will find 
more extensive use in the clinical 
testing of children” (p. 144). 

Following the positive results ob- 
tained by Martin and Wiechers, 
Stacey and Carleton (1955) investi- 
gated the degree to which the WISC 
and S-B scores of Ss for a restricted 
range of intelligence (possible menta- 
defectives) compared with performl 
ance on the Colored Progressive 
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Matrices. They found much lower 
correlations. 

Motivated also by the time factor 
Barratt (1956) investigated the rela- 
tionship between the WISC and the 
1938 edition of the Progressive Ma- 
trices. Using 70 children who made 
up the entire fourth grade of a school, 
Barratt found correlations of .692, 
.699, and .754. 

Because of the small number of 
studies reported, it is difficult to draw 
more than very tentative conclusions 
about the relation between the WISC 
and either form of the Progressive 
Matrices. It does appear, however, 
that when the Colored form is ap- 
plied to a group of children with a 
normal spread of IQ scores, fairly 
high correlations can be expected, 
and that the Verbal and Performance 
Scales correlate equally well. 

Investigating specifically the prob- 
lem of testing children with reading 
difficulty, Smith and Fillmore (1954) 
reported a study correlating the 
WISC with the Ammons Full Range 
Picture Vocabulary Test, and con- 
cluded that as a screening device of 
intelligence the Ammons can be used 
with children with reading handicaps. 

Delp (1953a), as part of a larger 
study, gathered data to compare the 
Kent Emergency Scales (EGY) with 
the WISC. He concluded that in 
view of the rather low correlations 
the primary value of the Kent EGY 
is not its correlation with the WISC, 
but its particular type of questions. 

As part of a study to determine 
whether currently available tests 
would predict school achievement for 
bilingual pupils on the Territory of 
Guam, the WISC was administered 
to a sample of 51 fifth grade children 
(Cooper, 1958). In spite of the lan- 
guage handicap, significant correla- 
tions were reported with the Leiter 
International Performance Scale and 


the Columbia Mental Maturity Scale. 

Group intelligence tests. Correla- 
tions between the WISC and the Sci- 
ence Research Associates Primary 
Mental Abilities Test are reported 
by Stemple (1953) and are shown in 
Table 4. 

Altus (1952) reported correlations 
between the WISC and the California 
Test of Mental Maturity (CTMM). 
She selected a sample of 55 Junior 
High School children so as to repre- 
sent the entire student body as to 
age, sex, proportion in each grade, 
proportion of bilinguals and IQ as 
measured by the CTMM. The cor- 
relation of .81 between the WISC 
Full Scale and the CTMM Total led 
her to conclude that “the WISC 
probably has considerable validity in 
comparable school settings” (p. 231). 

A second study by Altus (1955), 
which was undertaken to test the as- 
sumption that the verbal and non- 
verbal portions of the WISC and the 
CTMM are significantly related, re- 
ported further correlations between 
these two tests (see Table 4). The 
100 students referred to the guidance 
department by teachers included 36 
who were referred for special training 
classes for the mentally retarded. 
Altus felt justified to conclude that 
“within a comparable school referral 
setting, the WISC and CTMM are 
markedly comparable as to group 
assessment and roughly comparable 
as to individual abilities.” 

While the three studies reviewed 
all report rather high correlations be- 
tween the WISC and group intelli- 
gence tests, again the small number 
of studies precludes more than the 
very tentative acceptance of these 
conclusions. 

Achievement tests. Maussen et al. 
(1952) reported a study with a group 
of Ohio State University elementary 
school children correlating WISC 
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scores with various measures of 
achievement. These correlations 
vary from .29 to .81. The fact that 
the intellectual range was limited by 
the “highly select population” may 
well have affected the obtained cor- 
relations adversely. 

Frandsen and Higginson (1951) 
found rather consistent middle range 
correlations for fourth-grade children 
between the WISC scores and the 
Stanford Achievement Total score. 

Barratt and Baumgarten (1957) 
related WISC scores to scores on the 
reading and arithmetic subtests of the 
California Achievement Tests for 30 
achievers and 30 nonachievers in 
grades four to six. The achievers 
scored significantly higher on all 
scales of the WISC than the non- 
achievers. In both cases the Verbal 
Scales correlated higher with the 
reading subtest than did the Per- 
formance Scale. The almost chance 
relationship found between the WISC 
IQ’s and the arithmetic achievement 
for achievers contrasted with the sig- 
nificant relationship between the two 
tests for nonachievers suggests 
strongly that other important vari- 
ables are involved. 

Sandercock and Butler (1952) 
found low positive correlations be- 
tween a measure they call the 
Achievement Quotient and the WISC 
Scales for 90 mentally defective chil- 
dren. The Achievement Quotient 
was derived from judgments of the 
child's academic progress relative to 
his age. Further correlations with 
test measures of achievement for de- 
linguent children were found by 
Richardson and Surko (1956). 

Stroud etal. (1957) wished `- - - £0 
determine the effectiveness with 
which all or various combinations of 
the WISC subtests could be used to 
predict performance on Reading 
Comprehension, Arithmetic, and 
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Spelling tests of the Iowa Tests of 
Basic Skills battery” (p. 18). The 
tests were administered to 725 pupils 
in grades three to six drawn from a 21 
county area in Iowa. All of the chil- 
dren had been referred for psycho- 
logical interviews and testing and 
“were in, or were thought to be 
in, some kind of school difficulty” 
(p. 18). The mean IQs were within 
the dull normal range. All of the 
various intercorrelations were calcu- 
lated and beta weights for the vari- 
ous subtests determined. The au- 
thors found that the Arithmetic, Vo- 
cabulary, Block Design, and Object 
Assembly subtests were most effec- 
tive in prediction for both the orig- 
inal group and a cross validation 
group of 129 like pupils. They con- 
cluded that their study gave no sup- 


port for the use of separate verbal, 


nonverbal, and subtest scores in dif- 
ferential prediction. 

The relation of the WISC IQ to 
another form of achievement was in- 
vestigated by Robinowitz (1956) 
who wished to discover whether the 
brighter child as measured by an in- 
telligence scale is the one who learns 
the relationship of opposition at an 
earlier age. Robinowitz found a sig- 
nificant difference (at the .01 level) 
in scores on the WISC between those 
children who were able to learn the 
relationship and those who were not. 
A point bi-serial correlation of .609 
was found between the ability to 
learn the relation of opposition and 
scores on the WISC. 

While not directly related to 
achievement, Mussen et al. (1952) 
reported correlations between teach- 
er’s rating of intelligence on the Hag- 
gerty-Olson-Wichman Rating Scale 
of Intelligence and the WISC of .64, 
53, and .68 for the Verbal, Per- 
formance, and Full Scales respec 
tively. 
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In general it would seem that the 
relationship between ability and 
achievement must be recognized as 
highly involved and complex, and 
should be subjected to much further 
investigation. At present it seems 
safe to say only that the WISC re- 
lates to scores on certain types of aca- 
demic achievement tests for certain 
groups of children quite well. In 
general, the Verbal Scale seems to 
relate to test-measured academic 
achievement better than the Per- 
formance Scale. 


Construct Validity 


While an attempt at a full appraisal 
of the construct validity of the WISC 
would go far beyond the scope of this 
article, a few comments seem to be in 
order. 

Concerning construct validity, 
Cronbach and Meehl (1955) state 
that “unless the network (nome- 
thetic) makes contact with observa- 
tions, and exhibits explicit, public 
steps of inference, construct valida- 
tion cannot be claimed” (p. 291). At 
present, since little independent ra- 
tionale exists for the WISC, it would 
seem that only a few rather general 
hypotheses could be drawn from the 
conceptual framework behind the 
WISC. In few studies is there an at- 
tempt to make these steps of infer- 
ence explicit and public. 

General intelligence. The assump- 
tion of the global nature of general in- 
telligence is basic to the development 
of the Wechsler scales (Wechsler: 
1944, 1949, 1958; Wechsler & Weider, 
1953), and would imply that the 
WISC should correlate with other 
measures of general intelligence. The 
studies discussed under the heading 

of Concurrent Validity lend support 
to this view of general intelligence. It 
should be noted, however, that these 
studies lend support only to the as- 


sumption of a general trait which 
underlies all test behavior. The 
broader assumption of a general trait 
entering into all purposeful behavior 
both in and out of test situations is 
not touched by these studies. 

Nonintellective factors. Also basic 
to Wechsler’s theoretical position is 
the assumption that the particular 
subtests used in the WISC tap not 
only general intelligence, but other 
“nonintellective”’ factors. Some of 
these factors are specific to the par- 
ticular subtest (e.g., specific skills 
such as memory); others are more 
general and affect several or all of 
the subtests (e.g., ‘drive’’). 
these assumptions fit well into gen- 
eral testing theory in accounting for 
the various intercorrelations, it is 
very difficult to find any explicit 
statements about which subtests are 
affected by what other factors. 

Both in discussion of the WISC 
and in its use a distinction is made 
between the Verbal and Performance 
Scales. Wechsler (1958) tentatively 
identifies the factors as measured by 
the adult scales as a verbal compre- 
hension factor and a nonverbal or- 
ganization factor (variously identified 
as performance, nonverbal, space 
and visual-motor organization), 
Gault (1954) reported a factor anal- 
ysis of the intercorrelations printed 
in the WISC Manual (Wechsler 
1949) and found the same general 
pattern of factors in the’ WISC as 
was reported by Hammer (1950) for 
the adult scales. The four factors 
worthy of note were called a “general 
eductive factor, a verbal compre- 
hension factor, a spatialeperceptual 
factor and a memory factor” (p. 87 
The verbal comprehensi oo 

l on factor and 
the spatial-perceptual factor cor- 
respond roughly with the Verbal and 
Performance Scales. 


Lotsof, Comrey, Bogartz, and 
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Arnsfield (1958) reported a factor 
analysis of WISC and Rorschach 
scores of 72 under-achieving children 
with reading disabilities. They found 
four factors which they called verbal 
_ intelligence, productivity, perceptual- 
movement, and performance speed. 
The Verbal and Performance Scales 
were not factorially pure, however; 
the Block Design was loaded signifi- 
cantly with the verbal intelligence 
factor, and Comprehension and 
‘Arithmetic were loaded with the per- 
formance speed factor. They con- 
cluded that “the verbal and perform- 
ance aspects of the WISC are not 
independent of each other” (p. 301). 
In general, evidence seems to sup- 
port the rough factorial distinction 
between the Verbal and Performance 
Scales. Beyond this evidence on the 
division of the WISC into Perform- 
ance and Verbal Scales, there seems 
to be no systematic investigation of 
the nature of any other of the some- 
what general or specific factors 
tapped by the WISC subtests. This is 
of particular importance in evaluat- 
ing the clinical use of the WISC and 
will be discussed in a later section. 


CHARACTERISTICS OF THE WISC 

AS A MEASURING INSTRUMENT 

As with any measuring device, the 
user of an intelligence test must be 
familiar with the characteristics and 
idiosyncrasies of the test to be taken 
into account in any interpretation of 
the results. Several studies have been 
aimed either directly or indirectly at 
furnishing the WISC user with this 


information. 


Reliability f 

t. 

Wechsler (1949) and Seas ore e 

al. (1950) report coefficients of a 

ternal consistency (split-half re lia- 

bilities corrected by the Spear a 
Brown formula) for all scales an 
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all subtests but Coding, Digit Span, 
and Mazes, at the 74-, 103-, and 134- 
year levels. These figures range from 
-86 to .96. The coefficients of internal 
consistency for the various subtests 
range from .59 for Comprehension 
and Picture Completion at the 74- 
year level to .91 for Vocabulary at the 
103-year level. The standard errors 
of measurement in IQ points for the 
three age levels for the Verbal Scale, 
Performance Scale, and Full Scale 
range from 3.00 to 5.61. 

Both Wechsler (1949) and Sea- 
shore et al. (1950) warn the user to 
take into account the fairly low relia- 
bilities of some of the subtests in in- 
terpreting either the absolute sub- 
test scores or relations between them. 
For instance, at the 74-year level 
only Vocabulary, Picture Arrange- 
ment, Block Design, and Mazes have 
coefficients of internal consistency 
above .70, while Comprehension and 
Picture Completion fall as low as .59. 
The reliability of the test in general 
tends to increase with age, so that 
at age level 134 all subtests except 
Digit Span (.50) and Picture Com- 
pletion (.68) are above .70. 

The stability of the WISC scores 
over a four-year period has been in- 
vestigated by Gehman and Matyas 
(1956). Sixty children were tested in 
the fifth grade and again in the ninth 
grade. Coefficients of stability for the 
three scales were: Verbal Scale, .77; 
Performance Scale, .74; and Full 
Scale, .77. 


Sensitivity to Other Factors 


Any measuring device, be it a sur- 
veyor’s tape or an intelligence test, 
can be influenced by factors other 
than the ones the user wishes to meas- 
ure. While WISC users appear to be 
aware of this fact, few studies appear 
which give direct information with 
which to evaluate any particular 
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WISC examiner-child interaction. 

Practice effects. Holloway (1954), 
in an attempt to investigate the ef- 
fect of a particular kindergarten pro- 
gram on the IQ scores of children, 
found that both his control and experi- 
mental groups showed significant 
gains (at the .01 level) in WISC Full 
Scale IQs over what appears from 
his report to be approximately a six- 
month period. The problem sug- 
gested by this study of the practice 
effects on repeated administrations 
of the WISC given over relatively 
short periods of time has not, to the 
writer’s knowledge, been subjected 
to further direct investigation. 

In studies in which the WISC and 
S-B or W-B have been administered 
in close temporal proximity, the au- 

thors have consistently reported no 
significant practice effects on the 
WISC scores (Kureth et al., 1952; 
Holland, 1953). It would not be safe 
to generalize from these findings to 
the WISC, however, for the case in 
which the test items are identical 
rather than more or less similar might 
well be different. This would seem 
especially true of performance items 
in which an ‘insightful’ solution 
might be retained or of verbal items 
which might be taken back home or 
into the school room and discussed 
with others. 

Variables in the test situation. The 
possible effects of differences in the 
examiner’s technique of administra- 
tion is another problem area which 
has not received the attention it mer- 
its, as is the whole field of possibilities 
arising from the relation between the 
examiner and the child and the cir- 
cumstances of the examination. This 
is surprising, as the importance of 

these variables appears to be gen- 
erally assumed. 


Range of Application of the WISC 
The literature provides consider- 


able evidence that the WISC cannot 
be applied indiscriminately to all 
groups without considerable revision 
of the interpretation of the IQ score. 

Southern Negro children. In con- 
nection with another study Young 
and Pitts (1951) tested 40 southern 
Negro children who were selected as 
a control group representative of 
their culture. These children were 
not retarded by socioeconomic cri- 
teria or by the judgment of observ- 
ers. The mean WISC Full Scale 19 
score of this group was, however, 
69.8. To follow up on these results, 
Young and Bright (1954) tested a 
larger group of southern Negro rural 
children, and again found the mark- 
edly low mean WISC Full Scale IQ 
score of 67.74. The authors con- 
cluded that “We must question 
whether the WISC is a suitable test 
for the southern Negro child” (p. 
220). 

Bilingual children. Altus (1953) 
investigated the applicability of the 
WISC to children of bilingual Mexi- 
can descent. She compared the test 
patterning of these children with uni- 
lingual children equated for age, sex 
and performance IQ and found that 
the Verbal Scales of the bilingual 
group were lower than the Perform- 
ance Scales to a highly significant de- 
gree (a difference of nearly 17 points). 
No significant difference was found 
for the unilingual group. While this 
mee was conducted with a group of 
bet eee? oe ee te 

eration for 
placement classes for mentally re- 
tarded, it again poi he 
a ed, ıt again points out the need 
exercise care in interpreting the 
IQs obtained from 
different group. ee ONEALS 
. Levinson (1959) administered sey- 
ral intelligence tests to 117 Jewi 
z ewish 
preschool children and found that th 
eka all three WISC e cans 
igher at the .05 level of confidence 
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for unilingual children than for bi- 
lingual children. 

Socioeconomic status. The possible 
effect of socioeconomic status was 
considered by Estes (1953) who ad- 
ministered the WISC to two groups 


“of second- and fifth-grade children 


differing in socioeconomic status as 


` measured by the Warner-Muhr-Eells 


Index of Status Characteristics. Sig- 
nificant ‘differences in favor of the 
higher level were found in the total 
group of children and for the second- 
grade children. The difference for 
fifth-grade children was not signifi- 
cant. Levinson (1959), on the other 
hand, found no correlation between 
IQ level and socioeconomic back- 
ground for Jewish preschool children. 

Estes (1955) reported a follow-up 
of the earlier study in which 18 of the 
upper and 14 of the lower socioeco- 
nomic group were retested after a pe- 
riod of two years. The significant 
differences found when the children 
had been in the second grade no 
longer existed. The authors felt that 
this lessening of the effect of socio- 
economic status reflected the in- 
creased “leveling” influence of the 
school with the passage of two years. 

Laird (1957) tested two groups of 
11-year old children differing in so- 
cioeconomic status but matched for a 
number of other variables. The mean 
score of the upper socioeconomic 
group fell within the bright normal 
range while the lower group had a 
mean score falling within the aver- 
age range. Greater differences were 
found between Verbal and Full Scale 
scores than between Performance 


scores. 

Mentally retarded. 
single subgroup to W 
has been anne is 
mentally retarded or c cu 
dren. The question of the oy aoe 
of the WISC when used with : ese 
children was brought up by Carleton 


The largest 
hich the WISC 
the group of 
deficient chil- 


and Stacey (1955), who reported an 
item analysis of the WISC for “a 
sample of 366 subjects tested at Syra- 
cuse State School who can be classi- 
fied as defective, borderline and dull 
normal” (p. 149). They found that 
for these children (a) relatively few 
items are misplaced with respect to 
order of presentation, and such mis- 
placement as does occur does not 
seem to be of sufficient extent to af- 
fect materially the subtest total 
score, and (b) for each subtest there 
is a relatively abrupt shift from items 
which appear to be quite easy to ones 
which are quite difficult so that there 
are relatively few items of the middle 
range of difficulty. 

A study by Stacey and Portnoy 
(1950) investigated the assumption 
that mental defective children will 
give responses to the WISC Vocabu- 
lary subtest at a lower conceptual 
level than borderline children. Two 
groups of children were tested (24 
mental defective and 27 borderline) 
and their vocabulary responses were 
scored descriptive, functional, and 
categorical as representing increas- 
ing levels of concept formation. Con- 
trary to expectation the borderline 
children gave significantly less func- 
tional and significantly more descrip- 
tive responses. 

Deaf children. The possibility of 
using the WISC Performance Scale 
with deaf children was investigated 
by Graham and Shapiro (1953). 
Three groups of children were 
matched for physical health, sex, 
color, nativity, age and Goodenough 
Draw-a-Man IQ. Group (a) con- 
tained children with a 60 db or 
greater loss of hearing in both ears 
sustained prior to significant lan- 
guage development. The test had to 
be modified somewhat to make pan- 
tomime instructions possible. Groups 
(b) and (c) contained children“ with 
normal hearing. Each child was ad- 
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ministered the WISC Performance 
Scale; Groups (a) and (b) with panto- 
mime instructions, and Group (c) 
with usual instructions. They 
found that Groups (a) and (b) did 
not differ significantly from each 
other, but were both significantly 
lower than Group (c). 

The authors concluded that while 
the WISC Performance Scale cannot 
be used without modification as a 
valid measure of the intelligence of 
deaf children, it seems feasible to use 
a correction factor to nullify the ef- 
fects of the pantomime instructions. 
‘In any case, they felt, the Perform- 
ance Scale can be administered via 
pantomime as a crude measure. 

Very young children. No studies 
are reported concerning the applica- 
bility of the WISC to the testing of 
very young children. It should be 
noted, however, that a child with a 
“mental age” of five or six or below 
would in effect be given subtests with 
as few as four or five items. The re- 
liability of such short scales would be 
open to considerable question. In 
order to use the test at these ages, 
more items need to be added to the 
lower end of most scales. This criti- 
cism would, of course, apply to the 
use of the WISC with retarded chil- 
dren below the age of eight or nine 
years. 


Summary: 


1. There is strong evidence that 
WISC norms are not applicable to 
children of markedly different sub- 
groups such as southern Negro and 
bilingual Mexican-American chil- 
dren. 

2. Socioeconomic status appears to 
be a significant variable affecting the 
IQ scores of young children (second- 
as opposed to fifth-grade children), 
such that the children of higher so- 


cioeconomic status tend to obtain 
higher scores. 

3. The WISC seems to be rela- 
tively insensitive to differences 
among mentally retarded children. 

4. The WISC Performance Scale 
when administered with pantomime 
instructions to either normal or deaf 
children can be used as a crude and 
spuriously low measure of intelli- 
gence. 

5. When the WISC is administered 
to children with “mental ages” below 
five or six years, the IQ scores can be 
expected to be relatively unreliable 
due to the limited number of “func- 
tional” test items at the low end of 
the scale. 


Short Forms of the WISC 


Two articles report attempts to de- 
velop short forms of the WISC., 
Carleton and Stacey (1954) made up 
21 different short forms of the WISC 
from the WISC records of 365 chil- 
dren who had been referred to the 
Syracuse State School for evaluation 
and for whom there was no suspicion 
of organic involvement (IQ range 46 
to 91). They correlated each of these 
short forms with the Full Scale IQ 
finding correlations which ranged .64 
for a two subtest combination (Com- 
prehension and Vocabulary) to 88 
for a five subtest combination of 
Comprehension, Arithmetic, Block 
Design, Coding, and Picture Com- 
pletion. 

Less hopeful results were reported 
by Yalowitz and Armstrong (1955), 
who derived three short form com- 
binations from the WISC records of 
229 children referred for numerous 
reasons to a child guidance clinic, 
Correlations with the Full Scale IQs 
ranged from .55 to .61. The authors 
felt that these low correlations may 
be attributed either to the “wide sub- 


test scatter found in WISC records of 
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emotionally disturbed children, or 
-the lower subtest intercorrela- 
tions found in the WISC than on the 
Wechsler-Bellevue” (p. 277). 
Armstrong (1955) divided the Vo- 
cabulary subtest of the WISC into 
two short forms consisting of odd and 
even words. The over-all split-half 
correlation for all ages five years no 


` months to 14 years 11 months was 


.88. She concluded that “the loss of 
reliability involved in using either 
alternate word list instead of the total 
Vocabulary list is minimal, especially 
when compared to the time saved” 
(p. 414). 


The Problem of Mental Age 


The departure from the use of the 
concept of mental age has led both 
to comments and to suggested ways 
of finding an MA from the WISC 
scores. Grove (1950) felt that while 
the publication of the WISC was a 
real contribution, Wechsler had 
“thrown the baby out with the wash” 
when he discarded the concept of the 
MA along with its use as a “practical 
method of defining levels of test per- 
formance.” The author then pro- 
vided a method by which a mental 
age score could be obtained. 

Wechsler (1951) himself, while still 
opposed to the MA as a measure of 
absolute intelligence, admitted that 
the MA concept has a use in compar- 
ing a child of a given age with chil- 
dren of his own age in performance 
on a given test. This test age, he felt, 
must be interpreted as a measure of 
“specific aptitude.” He then out- 
lined three different methods by 
which scores corresponding to test 
age’’ can be calculated. 

Kolstoe (1954) compared the per- 
formance of 29 third- and fourth- 
grade children (S-B 1Q 116 or above) 
with 29 eighth- and ninth-grade chil- 

or below) on 11 of 
dren (S-B IQ 84 


the 12 subtests of the WISC. Differ- 
ences significant at the .05 level of 
significance were found on only three 
of the subtests. They concluded that 
their results ‘‘support to a consider- 
able extent the generality of the 
mental age concept” (p. 167). 


Tue WISC As A DIAGNOSTIC 
INSTRUMENT 


In keeping with the growth of 
clinical psychology, tests previously 
used within a circumscribed area of 
prediction are finding use as more or 
less general diagnostic instruments. 
The WISC is, of course, a relatively 
standard sample of a child’s behavior 
and, as such, can be used as any other 
“sample.” Completely ‘‘disorgan- 
ized” behavior, for instance, will have 
grossly similar diagnostic implica- 
tions whether it occurs on the WISC, 
the Rorschach, or during a clinical 
interview. Beyond this use, however, 
there is a tendency to attempt to pre- 
dict a wide variety of types of behav- 
ior from scores derived from the 
WISC. 


Patterns of Subtest Deviations 


As one might expect, the almost 
unlimited possibilities presented by 
10 variables have engendered numer- 
ous hypotheses about how these vari- 
ables relate to various aspects of a 
child’s behavior. The problem of de- 
fining a “significant” deviation has 
been considered by Alimena (1951) 
who reported a method for achieving 
comparability of scores on the 
Wechsler subtests (for all Wechsler 
tests) and for evaluating their dis- 
persion, based on the expected degree 
of trait variation within the indi- 
vidual. The author reported that the 
deviation norms have been calculated 
for the WISC and are available on re- 
quest from him. 

Differences between Verbal and Per- 
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formance scores. Recognizing that 
many WISC users tend to attribute 
meaning to any differences between 
a child’s Verbal and Performance 
Scale scores, Seashore (1951) turned 
to Wechsler’s original standardiza- 
tion data to investigate the meaning 
of such discrepancies. The WISC 
was originally designed so that the 
difference between average Verbal 
IQs and average Performance IQs 
was zero. Seashore found that the 
sigma of the difference scores for all 
ages was 12.5 and that the discrep- 
ancy scores closely approximate a 
normal distribution with mean 0.0. 
There were no important age dif- 
ferences in discrepancy scores. 

Investigating the possible effects of 
group differences on the distribution 
of deviation scores, Seashore found 
no appreciable differences between 
rural and urban children, and that 
the feeble-minded group did not have 
a Performance score higher ‘than 
their Verbal scores. Further, among 
the nine parental groups, only Pro- 
fessional and Semiprofessional 
showed any differences between mean 
Verbal IQ and mean Performance 
IQ. (Mean Verbal was about three 
points higher than mean Performance 
for both groups.) 

Newman and Loos (1955) investi- 
gated specifically whether there are 
differences between the Verbal IQ 
scores and Performance IQ scores for 
mentally defective children. They 
found that (a) mentally defective 
children classed as familial (N=128) 
obtained significantly higher s 
on the Performance tests than o 
Verbal tests (mean difference was 
8.07), (b) mentally defective children 
classed as undifferentiated (N=75) 
also performed significantly higher on 
the Performance than on the Verbal 
tests, but to a lesser degree than the 
familial (mean difference was 4.8), 


cores 
n the 


(c) mentally defective children due 
to brain damage or birth trauma and 
giving no evidence of severe motor 
defect showed no difference, and (d) 
the brain-damaged children showed 
significantly lower Performance 
scores than the undifferentiated 
group. 

Both Sloan and Schneider (1951) 
and Stacey and Levin (1951) also 
found the Performance Scale to score 
significantly higher than the Verbal 
Scale for the mentally deficient chil- 
dren they examined. In general, it 
seems that one should expect men- 
tally retarded children classified as 
familial or undifferentiated to obtain 
higher Performance than Verbal Scale 
scores. 

On the other hand, Atchison (1955) 
found that the 80 feeble-minded 
Negro boys and girls he tested tended 
to score higher on the Verbal Scale 
than on the Performance Scale, re- 
versing the differences found above. 
It would seem safe to assume that 
there are important variables in- 
volved in the relationship between 
Verbal and Performance Scale scores 
which were not controlled adequately 
in the above studies. 

Application of Hypotheses from the 
W-B. Hypotheses abound concern- 
ing patterns of deviations on t 
W-B and Delattre and Cole Res 
were concerned lest Psychologists 
might attempt to use these cues in 
interpreting the WISC. -Conse- 
quently, they compared the profiles 
of 50 W-B, Form I, protocols with 
the patterns obtained from WISCs 
of the same children. The data were 
analyzed to determine the extent to 
which the relative Position of a sub- 
test to the scaled mean occurring on 
the one test was likely to be repeated 
on the other. They concluded that 
the similarity of profiles is not large 
enough to warrant prediction in indi- 
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vidual cases, and, while the IQs will 
tend to be grossly similar, the clinical 
sign approach cannot be carried over 


from the W-B to the WISC. 


. It should be noted that Rabin and 
Guertin (1951) in their review of the 
W-B through 1950 conclude that 


; . ‘the scatter mountain gave birth toa 


mouse” (p. 240). The numerous 
studies they review suggested that 
“|. the various measures of scatter 
and variability—the different pat- 
terns have succeeded in differentiat- 
ing [some] groups, but not indi- 
viduals’” (p. 240). 

Reading difficulty. The question of 
a WISC pattern for children with 
marked reading difficulties has caught 
the attention of several authors. 
Altus (1956) reported finding a dis- 
tinctive test pattern for children with 
severe reading disabilities. The rec- 
ords of 25 children (24 boys and 1 
girl) who showed a discrepancy of 
two years or more between their ex- 
pected and actual reading level were 
investigated. Coding and Arith- 
metic subtests were found to be sig- 
nificantly lower than Vocabulary, 
Digit Span, Picture Completion, Ob- 
ject Assembly, and Picture Arrange- 
ment at the .01 level of significance; 
Information was lower than Picture 
Completion at the .01 level and lower 
than Digit Span at the .02 level. 
Altus found that these results were 
quite similar to W-B results on illit- 
erate soldiers, but did not state her 
criteria for similarity. 

In an intelligence test of 10 sub- 
tests, the chances that at least one 
subtest would deviate significantly 
from the mean of all of the others at 
the .01 level is one in 10. This eps 
takes on particular importance oo 
above study, for there was no rel 3 Fa 
ale stated prior to the study ny x sect 
one would expect any particula 
to deviate. 


Graham (1952) wished to check 
the assertion by others that the W-B 
and WISC profiles of unsuccessful 
readers and psychopathic adolescents 
are similar. He went over the records 
of 96 unsuccessful readers (25% or 
more below the mean of the Wide 
Range Achievement Test) who had 
achieved either a Verbal or Perform- 
ance Scale score of 90 or above, com- 
paring the mean scatters with the 
previously obtained (but unpub- 
lished) scatter of adolescent psycho- 
paths. Graham reported no statistics 
but concluded that these patterns 
“correspond closely.” For the un- 
successful reader, he found Arith- 
metic, Digit Span, Information, Digit 
Symbol, and Vocabulary subtest 
averages below the mean, and Object 
Assembly, Picture Completion, Pic- 
ture Arrangement, Block Design, 
Comprehension, and Similarities sub- 
test averages above the mean. Only 
Arithmetic and Similarities devi- 
ated to a degree significant at the .01 
level. 

A comparison of these results with 
those of Altus (1956) finds Arith- 
metic to be significantly lower than 
the others in both studies. Of the 
other subtests mentioned in both 
studies six of the subtests deviate in 
similar directions while two deviate 
in opposite directions. 

Organic brain damage. One study 
concerns itself with finding subtest 
patterns characteristic of children 
with organic brain damage. Beck 
and Lam (1955) investigated the 
WISC records of 104 children re- 
ferred as possible candidates for a 
special class for the educable men- 
tally retarded. These children were 
placed into three groups: (a) organic 
(N=27), diagnosed by neurological 
examination; (b) suspected organics 
(N=48), inferred by psychological 
studies; and (c) non-organic (V=29), 
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for whom there was no evidence of 
organicity from psychological eval- 
uation or developmental history. 
Eleven more children were added to 
Group (c) a year later. Fromacom- 
parison of the mean Verbal, Perform- 
ance, and Full Scale scores and of the 
intersubtest scatter, he concluded 
that (a) organics tend to score lower 
on the WISC Full Scale than non- 
Organics, (b) organics tend to score 
lower on the WISC Performance and 
Full Scales than on the Verbal Scale, 
c) the Possibility of organic damage 
increases considerably as the IQ 
drops below the 70-80 range, and (d) 
the WISC does not show a charac- 
teristic pattern of subtest scores for 
Organics as a group (as opposed to 
nonorganic, possibly mentally re- 
tarded children). 


The Interpretation of Individual Sub- 
test Scores 
As noted above, Wechsler (1944, 
1958) assumes that specific subtests 
tap not only general intelligence, but 
specific factors as well, The exact 
nature of these factors, however, is 
far from clear. Some hints are given 
by Wechsler (1944, 1958) as to what 
he considers these factors to be for 
the adult scales; no help is given in 
interpreting the meaning of the sub- 
tests of the WISC when applied to 
children, however, beyond the state- 
ment that the subtests seem to meas- 
ure different factors in children than 
in adults (Wechsler, 1949). Balinsky 
(1941) found evidence to suggest that 
even within the adult scales the sub- 
tests do not measure the same factors 
at all age levels. 

Nowhere in the literature covered 
is there more than the barest begin- 
ning of the investigation of the vari- 
ous interpretive hypotheses. It 
would appear that most, if not all, 


are based on an intuitive appraisal of 
the content of the subtest and the in- 
formal observations of test admin- 
istrators. While some agreement 
might be found as to the most likely 
interpretation of some subtest scores 
(e.g., Digit Span), other subtests 
(e.g., Similarities) might produce 
wide disagreement, Even if one 
could find agreement as to what a 
particular item should measure, the 
question of empirical validation 
would remain. It should be noted 
further that most, if not all, of the co- 
efficients of internal consistency 
would cast much doubt on any indi- 
vidual prediction. In the last analy- 
sis, it would seem that any prediction 
made on the basis of an individual 
subtest score js little more than a 
rationalized hunch, A plausible 


rationale certainly does not make a 
valid measure, 


SUMMARY 
This article has 


erature concerning the WISC since 
its publication i 


SC asa measure of y 
chological variables, and 
together for the user of 


as a measure of inte 
acteristics as psyc 
ing device, and its 
instrument. 

As summaries have been Provided 
whenever appropriate within the 
body of the article, no attempt wil] 
be made here to repeat all of the 
points considered or information 
brought out. A few general state- 
ments do, however, seem in order 
concerning some rather important 
areas of unmet need. 
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Aside, perhaps, from correlations 
between the WISC and Stanford- 
Binet for normal white school chil- 
dren, further investigation of any of 
the problems discussed could add sig- 
nificantly to our fund of knowledge, 


` both practical and theoretical, con- 


cerning the WISC and its use. Three 
areas in particular, however, stand 
out. 

1. The WISC does not have an 
adequate rationale. Much more 
thought and effort need to be de- 
voted to putting the WISC on a firm 
theoretical foundation. At present, 
both the assessment of the test’s con- 
tent validity and the long process of 
construct validation are severely 
handicapped by this lack of an ex- 
plicit rationale. 

2. The lack of investigations of the 
test’s predictive validity in its many 
common uses is appalling. At pres- 
ent, the test’s content and construct 
validities are not strong enough to 
support the use of the test without 
this criterion-oriented validation. It 
would seem that all possible occa- 
sions should be taken to discover ex- 
perimentally, if the WISC does in- 
deed predict what it is assumed to 
predict. For example, children are 
placed in classes for the mentally 


retarded on the assumption that 
they will respond to various learning 
situations in characteristic ways. 
How well does the WISC predict this 
response in a well-controlled, experi- 
mental situation? 

3. Much more systematic atten- 
tion should be given to investigations 
of the many practical problems in- 
volved in the use of the WISC as a 
measuring device. There appears to 
be strong reason to suspect that 
WISC scores are affected systemati- 
cally by many variables other than 
intelligence, but little information 
about the exact nature of these vari- 
ables and the relationships involved 
is available. Especially in need of 
systematic investigation is the effect 
on WISC scores of (a) variables in 
the relationship between examiner 
and examinee, (b) the circumstances 
of the examination, and (c) repeated 
administrations of the WISC. 

On the other hand, the WISC ap- 
pears to be a relatively well-stand- 
ardized test with many virtues. It 
correlates consistently well with other 
measures of intelligence, appears to 
be widely accepted and used, and, in 
general, seems to merit further re- 
search and development. 
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The purpose of this discussion is to 
analyze briefly a few statements pre- 
sented in a paper by Haggard, Chap- 
man, Isaacs, and Dickman (1959) en- 
titled “‘Intraclass correlation vs. fac- 
tor analytic techniques for deter- 
mining groups of profiles.” They are 
concerned chiefly with comparing the 
“factor analytic” and “direct corre- 
lation” methods as two approaches 
for grouping profiles, a problem in 
typology. Careful reading of this 
paper reveals a number of ambiguous 
or incorrect presentations of technical 
and procedural matters. If these 
technical matters are not clarified, 
other investigators interested in ap- 
plying these two approaches to prob- 
lems of profile comparisons may be 
seriously misled. The present discus- 
sion will not consider the more funda- 
mental question of the usefulness, for 
any scientific purpose, of any method 
of grouping individuals on the basis 
of the “shape,” “level,” and ‘‘scat- 
ter” attributes of a set of numbers 
termed a profile. (Cronbach & 
Gleser, 1953) 


Factor ANALYTIC TECHNIQUES 


Readers of the paper by Haggard 
et al. may well gain the incorrect 
impression that the orthogonal cen- 
troid method, the centroid method 
with an oblimax solution, and the 
multiple group method are three, or 
at least two, different methods of 
factor analysis. The early niecom- 
ception of the multiple group n 
centroid techniques as distinct meth- 
ods has been clarified in papers by 
Guttman (1944, 1952) and Harman 


(1954). The multiple group proce- 
dure is simply a computing technique 
which may be used to define one, 
but usually simultaneously two, or 
more composite variables or “factors” 
from a set of observations. Appro- 
priate application of the multiple 
group procedures will lead to any 
of the possible orthogonal or oblique 
axes or ‘‘factors.” The necessary sets 
of weights for defining the desired 
“factors” may be obtaining in a 
number of ways; one may use any of 
the several ‘‘complete centroid” or 
“group centroid” methods with or 
without “rotational” transforma- 
tions, or one might apply cluster 
methods based on an inspection of 
the data. An investigator may use 
some a priori “theoretical?” or even 
(within limits) an arbitrary formula- 
tion for the weights or supplemental 
information such as clinical ratings of 
cases as “‘hypertensive,” ‘‘neurotic,”’ 
or “psychotic,” the procedure used in 
the paper under discussion. 

The “oblimax rotational solution” 
is simply one of the several available 
analytical procedures, each of which 
defines the “factors” by an objective 
transformation of certain types of 
data called “factor loadings.” As de- 
fined by Thurstone (1947), factor 
loadings are the “orthogonal projec- 
tions” of the variates, considered as 
vectors, upon a set of special axes 
added to the system and named 
“normals” or “reference axes.” The 
oblimax procedure is a method for 
“rotating” such a set of reference 
axes, either orthogonal or oblique, to 
positions defined by a mathematically 


157 


158 HAROLD P. BECHTOLDT 


stated criterion. 

The common misconception that 
the oblimax procedure! is restricted 
to rotating orthogonal reference axes 
may account in part for the inappro- 
priate comparison by Haggard et al. 
(1959) of the oblimax data (Set A) 
with the data from the multiple 
group method (Set B) shown in their 
Table 2, page 51. However, their 
discussion of these two sets of data 
in terms of factor loadings and pro- 
jections as well as their untenable 
distinction between “oblique space” 
and “orthogonal Space” suggest a 
lack of understanding of the basic 
concepts of factor analysis as de- 
veloped by Thurstone (1947) and by 
Holzinger and Harman (1941), 


data and Set C data have been 
checked by the writer and have been 
found to be accurate to within +.02. 


and Harman (1941), although Hag- 
gard et al. refer to the Set B values as 
the “factor pattern” and use the 
term “factor structure” to refer to 
the intercorrelations among the fac- 
The Set B and Set C 
data are derived from two somewhat 
ifferent definitions of “clusters” of 
profile vectors, 
b he Set A values are entitled the 
centroid solution, oblimax rota- 
tion.” This designation of the Set A 
data is ambiguous. The Set A values 
from an oblimax Program might be 
expected to be either orthogonal pro- 
jections on a set of “normals,” or 
oblique projections on a set of “prj. 
mary axes,” these oblique Projec- 
1 The use of the oblimax solution with any 
set of reference axes was clarified by Kern 
Dickman. 


tions being referred to by Holzinger 
and Harman as the “factor pattern.” 
However, the values could also repre- 
sent two other sets of projections. 
Because of what apparently are com- 
putational errors in the Set A data in 
addition, an unambiguous identifica- 
tion of these values could not be 
made without repeating the entire 
set of calculations, 

Unfortunately, even when all the 
computations are correct, the values 
represented by the Set A data and by 
the Set B data probably are not com- 
parable sets of values for an oblique 
system. The oblimax data either are 
linear regression coefficients express- 
ing the variates in terms of the factors 
when the values are oblique projec- 
tions on the Primaries or, as “ 
loadings,” are Proportional to regres- 
sion coefficients. The Set B data are 
covariances between the factors and 
the (profile) variables, Covariances 
and regression coefficients are not 
generally comparable numerically, 


1 coefficients 


fined factors and observed variates 
are related by a simple equation 
(Holzinger & Harman, 1941, p. 327). 

ontrary to the implications of 
Statements in the Paper under dis- 
cussion, the values in the Table 3 
page 51, are not relevant to this prob- 
em. 


. Ina factor analysis, the distinc- 
tions between referen 


tween 
the three sets of data labeled Set B, 


Set E, and Set F in Table 1 below 


„9t “reference vector 
Pattern values might be referred to 


PESIN 


t- 


ag 


INTRACLASS CORRELATION VS. FACTOR ANALYTICAL TECHNIQUES 159 


TABLE 1 


THREE ALTERNATIVE METHODS FOR REPRESENTING A GIVEN FACTOR ANALYSIS SOLUTION 
OBTAINED WITHOUT ROTATION 


Set B 


Set E 


Set F 


Orthogonal Projections 


Oblique Projections 


Orthogonal Projections i? 


on Cluster Primary on Cluster Primary on Normals or Commun- 
Axes Axes Reference Axes ality 

(Factor Structure) (Factor Pattern) (Factor Loadings) Estimates 

I II Ill I II Ill I IL III 

1 88 49 —74 63 14 -25 36 12 —16 -801 
2 87 18 -70 101 —32 -02 58 —-27 —01 840 
3 101 68 —S59 110 19 25 63 16 16 1.078 

Se ee at 

4 56 100 —21 15 95 08 09 79 

5 26 96 07 —14 105 16 —08 87 0 ge 
6 44 77 -33 -20 81 -33 -11 67 A 1644 
TO Si 88 —19 18 82 0 10 68 06 “801 
8 —40 09 82 36 11 110 21 09 71 +745 
9 -83 -22 99 —27 06 81 —15 05 $2 1.007 
10 —84 —37 96 —18 -13 81 -10 -1i 52 -983 
11 —62 05 96 02 23 102 01 19 66 .973 
12 —72 —40 90 07 -27 90 04 —22 58 -850 


ğ Note.—Data computed from 12 MMPI profiles presented by Haggard et al. (1959) in Table 1, p. 49. The correla- 
tions between “factors” or cluster primary axes, to two decimals, are given in Part II, Set B, of Table 2, p. 51, of 


Haggard et al (1959). 


® The decimal points have been omitted in Cols. I to III inclusive in Sets B, E and F. The decimal is located 


two places to the left, i.e., -.74. 


by some investigator as showing the 
“factor loadings” of 12 MMPI pro- 
files on three axes defined by the 
clinical groupings of “hypertensive,” 
“neurotic,” and “psychotic.” These 
data were computed from the 12 
MMPI profiles as published by Hag- 
gard et al. The communalities were 
provided by E. A. Haggard. No 
“rotations” were made in obtaining 
these results; a single grouping of 
the profiles was made as specified by 
Haggard et al. (1959). 

The Set B data are the orthogonal 
projections on the three centroid 
cluster vectors, a set of “primary 
axes,” while the Set E values are the 
oblique or Cartesian projections on 
these same cluster vectors. The Set F 
data are the orthogonal projections 
on the reference axes OF norma, 
each of which is orthogonal to all but 


one of the cluster vectors. (The sets 
of primary axes and reference axes 
are each collinear with the “inverse 
vectors” of the other set.) If another 
definition of the “factors” were used, 
as in an oblimax solution, three (or 
four) additional “factor matrices” 
might be computed. (Comparisons 
between the results for different 
definitions of the factors could be 
made in terms of any of the corre- 
sponding (comparable) factor ma- 
trices. 

The ‘‘communalities” or “common 
factor variances” of the 12 MMPI 
profiles are shown in the h? column of 
Table 1. Since three of these values 
exceed unity, the upper limit for 
“admissable” values of the com- 
munalities, the analysis is a ‘“Hey- 
wood" case and not a “proper” factor 
analysis (Thurstone, 1947). The 
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presence of five residuals (computed 
from the Set B data of Table 1) ex- 
ceeding .04 in absolute value and of 
the three communalities of greater 
than unity indicate that another fac- 
tor probably should be defined; de- 
fined, however, only if a factor anal- 
ysis is considered worth making with 
intercorrelations based on 9 observa- 
tions (here, 9 MMPI scales). 


DIRECT CORRELATION METHODS 


A purportedly new approach to 
the problem of determining profile 
groups is presented by Haggard et al. 
(1959) through the use of “empirical 
criterion profiles,” i.e., the average 
profiles for a set of cases grouped in 
accordance with some set of rules, 
rules which may include nonprofile 
information (p. 51-52). The ap- 
proach is hardly new since Burt in 
1941 makes reference to his use of it 
in 1931 (Burt: 1937, 1941). 

A more basic point arises from the 
discussion by these authors of the 
comparison, with reference to their 
Table 2, of their Set B data (the 
covariances between cluster factors 
and 12 MMPI profiles) with their 
Set C data (the correlations between 
three average criterion profiles and 
12 MMPI profiles). They say that 
the methods agree closely, that the 
results are practically identical, but 
that they are not prepared to argue 
that either set is approximated by 
the other (p. 52). Apparently the 
authors are not aware of the fact that 
the results are not identical solely be- 
cause of the defining procedures they 
happen to have used. The insertion 
of communalities rather than unities 
in the matrix of correlations for the 
Set B data in the first source of the 
small differences. The second source 
is their definition of average profiles 

in terms of the usual MMPI scaled 
scores rather than in terms of stand- 
ard scores (over the 9 scales); this 


second source of differences intro- 
duces into their definitions differen- 
tial weighting of the profiles in terms 
of unequal profile standard devia- 
tions. The “criterion profiles” can- 
not, as they suggest, be considered 
as “computed on equalized means 
and sigmas” (p. 54) since the profile 
standard deviations are not equal. 

The equivalence of the “correla- 
tion of sums” methods and of the 
“sum of correlations” method for 
standard scores noted above as well 
as the ease of computation of correla- 
tions with totals or subtotals of 
scores is well known and has been 
discussed in these terms in some de- 
tail by Holzinger and Harman (1941) 
Holzinger (1944), and Richardson 
(1941) among others. Because of 
this equivalence, it is not correct to 
suggest that criterion profiles are 
more “intuitively meaningful” as a 
“definition of a factor” than are 
linear combinations of correlations 
(p. 53); the differences are only 
computational ones or are differences 
in definitions. This same basic point 
is also involved in the ambiguous 
statement dealing with the existence 
of a “general factor” in the footnote 
to Table 4, page 55. Furthermore if 
any appreciable amount of eae 
signment of variables to new groups,” 
a problem mentioned by the authors 
(p. 55), were to arise, the flexibility 
available in the method using sums of 
weighted correlations or covariances 
for forming new groups might well 
overbalance the initial computational 
convenience of the “correlation of 
sums” method. 

In presenting a secon 
relation method, that 
class correlations, the 
phasize the well-known 
tion to standard scores imposed by 
the definition of a product moment 
correlation. As a result of this defini- 
tion the product moment correlation 


d direct cor- 
of the intra- 
authors em- 
transforma- 


pi 
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is limited to providing a definition of 
“profile shape’; one cannot com- 
bine with this measure of “shape” the 
independently defined profile attri- 
butes of “level” and “scatter.” Sev- 
eral distinct composites of the three 
profile attributes, as pointed out by 
the authors, can be defined by an in- 
vestigator using the intraclass co- 
efficient. It seems worth noting, 
however, that Cronbach has recently 
raised a serious question as to the 
advisability of defining such “global” 
or composite concepts in his treat- 
ment of dyadic scores (1958). 

Only two additional points relating 
to the presentation of the intraclass 
correlation will be made here. The 
first point is that differences in means 
and variances are not, as the authors 
state, like ‘‘poor Clementine,” “‘lost 
and gone forever when 7 is used” 
(p. 53). The data are available and 
might even be used to test hypotheses 
regarding certain interactions as well 
as the homogeneity of the means 
when the study is properly con- 
ducted. (Lindquist, 1953). 

The second point involves the 
stituation in which an investigator 
‘may wish to form groups in terms 
of one or more a priori ‘ideal’ profiles 
which are based on theoretical con- 
siderations” (p. 55). It is simply not 
correct to state that ‘‘with the proce- 
dures discussed thus far, it is not 
possible to form groups around such 


a priori profiles” (p. 55). “‘Theoret- 
ical” profiles have been written and 
the profiles of individuals compared 
with “hypothetical types” or with 
“theoretical standard persons” for 
many years by Burt (1941) and 
others. Every possible comparison 
between any arbitrary sets of num- 
bers and one or more profiles can also 
be made; in particular, observed 
profile ‘“‘levels,” “scatter,” and 
“shape” indices can be compared to 
a priori values of these defined con- 
cepts. Correlations with such a 
priori “shape” values could be in- 
cluded in a factor analysis or ‘direct 
correlation” study as well as in an 
investigation using intraclass corre- 
lation coefficients. Whether any of 
these comparisons is empirically use- 
ful is another matter. 


CONCLUDING REMARKS 


A few of the more fundamental 
misconceptions or technically incor- 
rect statements contained in the 
paper by Haggard et al. (1959) have 
been noted. Other technically ques- 
tionable discussions include the au- 
thors’ presentation of the orthogonal 
centroid method (p. 50), the possible 
assignment of cases to groups defined 
by factor analytic methods (p. 56), 
and the testing of statistical hypoth- 
eses using sets of related observations 
(p. 48, p. 57). 
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REPLY TO PROFESSOR BECHTOLDT’S CRITIQUE 


ERNEST A. HAGGARD 
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After rereading Bechtoldt’s cri- 
tique of our paper (Haggard, et al., 
1959),1I still find it difficult to under- 
stand fully why he became so exer- 
cised over it. True, many of the com- 
ments in his section on ‘‘Factor Ana- 
lytic Techniques” are interesting and 
informative—and generally correc- 
tive—but they are also essentially ir- 
relevant to our paper and its purpose. 

It is obvious that Bechtoldt likes 
factor analysis and is steeped in it. 
It should also be obvious that I have 
a somewhat negative loading on the 
factor entitled “One should use factor 
analysis regardless of whether other 
methods can get the job done as well 
or better.” There is clear difference of 
opinion as to whether, when, or what 
factor analytic techniques should be 
used in particular cases. Fortu- 
nately or unfortunately, I have never 
been enamoured of this group of 
techniques, and if I were to deal 
with problems of the sort usually 
associated with these methods, I 
would prefer an approach to data 
analysis of the type proposed by 
Creasy (1957).2 


1 As senior author, I take full responsibility 
for any “erroneous,” “misleading,” ‘‘miscon- 
ceived,” or “ambiguous” statements in the 
paper criticized by Bechtoldt, and so am 
replying to his comments. j . . 

2 În a thoughtful and sobering discussion 
published several years ago, Horst (1950) out- 
lined some of the well-known limitations of 
factor analysis, such as the large amount of 
data needed, the time required to perform the 
computations, and a number of technical 
difficulties. With reference to the first point, 
Horst stated that “If you want to come out 
with results of any consequence you should 
have 50 or 60 variables or tests and at least 
500 cases.... I would not be inclined to 
take very seriously the results of any factor 


Some of Bechtoldt’s remarks in his 
sections entitled “Direct Correlation 
Methods” and “Concluding Remarks” 
call for more specific comment, pri- 
marily because I think he misses the 
point from time to time. For exam- 
ple, his apparent reference to the dis- 
cussion under the first italicized head- 
ing on our p. 54, namely, ‘When pro- 
file means and sigmas are equalized” 
has to do with the simple fact that for 
product-moment 7’s (over the nine 
subscales or profiles for any pair of 
Ss), mean =0 and sigma = 1 by defini- 
tion. This is just the point which he 
emphasizes a couple of paragraphs 
later when he speaks of “the well- 
known transformation to standard 
scores imposed by the definition of a 
product moment correlation.” Also, 
it may be that the expression ‘‘poor 
Clementine” is unfortunate stylis- 
tically, but I still fail to see how the 
coefficient r—by itself—provides in- 
formation as to the means and sigmas 
of the variates correlated. Of course 
the means and sigmas are involved in 
the computation of 7 but, even 
though they may be of great use, 
they are all too often ignored in 
practice. Along these same lines, 
Bechtoldt should have gone much 
further than he did at the end of 
this paragraph: he should have em- 
phasized along with Tukey (1951), 
Lindquist (1953), myself (1958) and 


analysis involving psychological tests, which 
falls far short of 10,000 man hours of testing 
time” (p. 53). Creasy (1957) also mentioned 
various limitations of factor analytic tech- 
niques, as have so many other workers in this 
area. But these matters, which are quite rele- 
vant to Bechtoldt's remarks, are too numer- 
ous and complex to discuss here. 
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many others that the estimation of 
components of variance is a much 
more useful approach to answering 
many research questions than r is or 
ever can be. 

In his next Paragraph, Bechtoldt 
has us on the Topes when he cites 
the statement “with the Procedures 
discussed thus far, itis not Possible to 


observe that, although multivariate 
data can be a 
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A REVIEW OF HEARING IN AMPHIBIANS AND REPTILES! 


THOMAS E. McGILL? 
Williams College 


In the course of the evolutionary 
development of the vertebrate ear 
important changes took place as ani- 
mals became terrestrial. The aquatic 
ear of the fishes was next to useless 
without the development of special- 
ized accessory structures for the pur- 
pose of matching the impedance be- 
tween aerially conducted sounds and 
the fluids of the inner ear. The mid- 
dle ear mechanism of man is such an 
impedance-matching device. In the 
lower vertebrates the function is 
served by one rod-like bone, called 
the columella, which extends from 
the tympanic membrane to the inner 
ear. This structure is typical in both 
amphibians and reptiles. 

Knowledge of the structure of the 
auditory mechanism in these classes 
far exceeds knowledge of the func- 
tion. Disagreement exists on both 
theoretical and empirical grounds as 
to whether the hearing organs of cer- 
tain modern amphibians and reptiles 
are vestigial or rudimentary. If they 
are vestigial, then, presumably, the 
animals concerned are deaf; if they 
are rudimentary, the animals should 
have some sense of hearing. 

Presumptive evidence of hearing 
is obtainable from electrophysiologi- 
cal studies. Positive results from 
such studies demonstrate functional 
activity of the auditory system from 
the drum membrane to that part of 
the system from which the recording 
was taken. However, proof of hear- 
ing can only result from some be- 


1 This research was supported by Grat 
G-6119 from the National Science Founda- 


tion. " 
2 At present, United States Public Health 


Service Postdoctoral Fellow, University of 
California, Berkeley, California. 


havioral manifestation on the part 
of the animals to sounds. These 
behavioral manifestations are of two 
general types: some natural reaction 
of the animals to sound, or a trained 
reaction. Following is a review of 
studies concerned with hearing in 
two orders of the class Amphibia 
and three orders of the class Reptilia. 
Amphibia 

Order Urodela—salamanders. Fer- 
hat-Aket (1938) was able to demon- 
strate that Amblystoma mexicanum 
and other urodele species could hear 
within the frequency limits of 32 cps 
to 244 cps. Head raising, snapping, 
and restless movements were condi- 
tioned to the sounds of Edelmann 
whistles, mechanically actuated tun- 
ing forks, organ pipes, and a cello 
when these stimuli were followed by 
food. Kuroda (1926), using breath- 
ing rate changes to acoustic stimula- 
tion, could find no evidence of hear- 
ing in six newts. His stimuli were 
hand claps, whistles, tuning forks, 
electric bells, pistol shots, a har- 
monica, and an Edelmann whistle. 

Order Salientia (Anura)—frogs 
and toads. Yerkes (1905) has pre- 
sented evidence of hearing in frogs. 
He found that whistles and bells de- 
creased the rate of respiration; he 
noted that stroking one frog and 
causing him to croak would result 
in other frogs croaking; he found 
that the reaction time to visual or 
tactual stimuli could be reduced if 
an auditory stimulus had been 
sounded within one second before 
the visual or tactual stimulation. 
The latter procedure resulted in an 
estimate of an effective frequency 
range of 25 cps to 5,000 cps. Bruyn 
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and Van Nifterik (1920) discovered 
that a sound signal such as Yerkes 
used with frogs would also reduce 
reaction time to tactual stimuli in 
the toad. The sound signal in the 
toad was effective for periods up to 
10 seconds. 

Bajandurow and Pegel (1932) re- 
ported that breathing changes and 
jumping were readily obtained in 
frogs to “whistle-tones” when shock 
was used as the unconditioned stim- 
ulus. The responses were unstable 
and Ss had to be retrained each day. 
Corbeille (1929) used natural breath- 
ing rate changes in frogs as an index 
of hearing for the sounds produced 
by a Cambridge vibrator. He esti- 
mated the effective frequency range 
to be 100 cps to 8000 cps. Kuroda 
(1926) also used natural breathing 
rate as an index and found “some 
evidence for hearing in adult frogs 
and toads.” 

The upper frequency limit re- 
ported by Yerkes (1905) and Cor- 
beille (1929) is surprising in view of 
the results of a recent study by 
Strother (if press). Strother at first 
made an unsuccessful attempt to 
condition bullfrogs to pure tones us- 
ing shock and leg-flexion. He then 
Operated upon the animals 
corded electri 
from the inner ear. 


Reptilia 
Order Crocodilia—crocodiles and 
alligators. Beach (1944) reported 


57 cps tone was 
presented. Other behavioral mani- 


festations of hearing were evident u 
to 341 cps. Three smaller animals 
did not roar but showed evidence of 


THOMAS E. MCGILL 


hearing by turning to the sound 
source or by snapping. In reference 
to the problems involved in the care 
of reptiles, Pope (1950, p. 327) 
wrote, “In contrast to most other 
reptiles, crocodilians will learn to 
come when called at meal times,” 
which, of course, implies that they 
can hear. 

Adrian (1938) and Adrian, Craik, 
and Sturdy (1938) successfully re- 
corded both “cochlear” potentials 
and action potentials from alligators. 
Wever and Vernon (1957), using the 
spectacled caiman (Caiman sclerops), 
recorded electrical responses from 
the inner ear for tones in the fre- 
quency range of 200 cps to 6000 cps. 
The frequencies to which the ani- 
mals were most sensitive were lo- 
cated between 100 cps and 3000 cps. 

Order Chelonia—turtles and tor- 
toises. Concerning hearing in turtles 
Munn (1955, p. 97) has written, 
“Snakes, turtles, and similar verte- 
brates are believed to be entirely 

No experiment on these ani- 
e least evidence 
hear noise.” 
tually, there are two rather obscure 
studies which profess to demons 


trate 
the turtle, 


made the positive sig- 
nal for feeding, Poliakoy (1930) 
ioned head with- 
a : ty of tones and 
noises in Emys orbicularis. Kuroda 
(1923, 1925, 1926) on the other 
hand, failed to confirm Andrews’ 
findings, à 

Recently, electrophysiological stud- 
ies dealing with turtles haye been 
carried out by Wever and 
(1956a, 1956b, 1956c). They found 
that the turtle’s ear was uniformly 
highly sensitive for faint tones in the 
Tegion from 100 CPS to about 700 


Vernon 
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cps. The potentials then fell off rap- 
idly up to 3000 cps, beyond which 
point injurious intensities were re- 
quired to produce a measurable re- 
sponse. 

Order Squamata—snakes and liz- 
ards. Concerning the evolution of 
the snake, Huxley (1953, p. 67) 
wrote ‘...all the circumstantial 
evidence makes it reasonably cer- 
tain that the ancestors of the group 
had to pass through a stage of ex- 
istance underground as deaf, half- 
blind, and legless burrowing lizards.” 
After re-emerging, the line ‘‘. . . re- 
acquired much of its power of vision 
(but not of hearing) and achieved 
new evolutionary success as snakes.” 
Kuroda (1926) failed to find any be- 
havioral evidence for hearing, and 
Adrian (1938) could record no re- 
sponse from the eighth cranial nerve 
in the grass snake to sounds; how- 
ever, the nerve did respond to tactile 
stimulation. Wever and Vernon 
have recently carried out investiga- 
tions of the electrophysiological re- 
sponses of the inner ear of several 
species of snakes.? The resulting po- 
tentials were found to resemble 
those of turtles in terms of range, 
and to be only somewhat reduced in 
terms of magnitude. 

The snake’s close relative, the 
lizard, definitely demonstrates hear- 


3 E, G. Wever and J. A. Vernon. Personal 
communication. 1959, 


ing according to Kuroda (1926). 
Kuroda found that lizards would 
open their eyes to tonal stimulation. 
He likened this reflex to Preyer’s 
reflex “as a reliable clue to make sure 
objectively of the normal state of 
audition.” Using a Galton whistle 
he established the upper frequency 
limit at 9675 vs per second (4837.5 
cps). 


Summary 


If one is willing to accept the be- 
havioral evidence available, then 
adult amphibians—salamanders, 
frogs and toads—can hear. Among 
the reptiles, alligators and lizards 
can hear, snakes cannot, and the 
hearing of turtles is in question. The 
electrophysiological evidence indi- 
dicates functional activity of the aud- 
itory system up to, and including, 
the sensory cells in every species 
tested. Because there has not been a 
single behavioral study successfully 
carried out since the development of 
modern electronic devices for the 
production and control of sound, we 
know nothing of the animals’ ability 
to hear pure tones, nor of their abso- 
lute limens in terms of intensity, nor 
is there any certainty of the fre- 
quency range for any species. The 
present state of knowledge of hearing 
in amphibians and reptiles is not 
commensurate with the importance 
of these classes in the study of the 
evolution of the sense of hearing. 
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STIMULUS GENERALIZATION! 
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Stimulus generalization (SG) is an 
empirical phenomenon which has, of 
late, been seeing heavy duty as an 
explan&tory construct in many dis- 
parate situations. It has been used 


, in theoretical explanations of dis- 


crimination learning (Spence, 1936), 
of transposition (Spence, 1937), ver- 
bal learning (Gibson, 1940), psycho- 
analytic displacement (Miller, 1948), 
the behavior of brain-damaged indi- 
viduals and schizophrenics (Med- 
nick, 1955, 1958a), cross-cultural re- 
search (Hull, 1950a), projective tech- 
niques (Moylan, 1959), and psycho- 
therapy (Magaret, 1950). It seems 
likely that these explicatory uses of 
SG are only a beginning. Inasmuch 
as we are probably never exposed to 


1 This paper was made possible by grants 
from the National Science Foundation 
(G-3855) and the American Philosophical So- 
ciety (Grant No. 2132) to the senior author. 
Freedman was at Harvard University when 
this paper was written; Mednick was at the 
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exactly the same stimulus situation 
more than once, all responses to 
stimulus “repetition” might justifi- 
ably be attributed to SG. This “in- 
tellectual imperialism” is condoned 
and encouraged by suggestions that 
differential reaction time responses 
(Gibson, 1939) and most of psycho- 
physics (Brown, Bilodeau, & Baron, 
1951) may in large part be under- 
stood in terms of SG. While we have 
some sympathy for these viewpoints, 
the purpose of this article is to pro- 
vide a concise summary of empirical 
findings concerning research on SG 
and to discuss evidence bearing on 
some important issues. 

Organization of the article. After 
first defining SG we shall review the 
area of sensory generalization. Here 
we will be referring to generalization 
along stimulus continua that can be 
measured on a physical scale (e.g., 
distance, intensity, size). Next we 
will discuss the effect of variables 
such as drive and degree of training 
on SG. We shall conclude with a dis- 
cussion of issues around which much 
research has revolved. 

Definition. Stimulus generaliza- 
tion can be said to occur when a re- 
sponse, previously trained to be 
elicited by Stimulus 0, can also be 
elicited by test stimuli similar to 0. 
The gradient of stimulus generaliza- 
tion (GSG) can be said to be observed 
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if the strength of these generalized 
responses (measured by frequency, 
latency, etc.) varies as an orderly 
function of the physical difference 
between the test stimuli and Stimulus 
0. (This is what we will mean when 
the terms SG and GSG are used in the 
remainder of this article.) Through 
loose usage, these terms have been 
invested with a rather large number 
of different connotations. We are in 
agreement with Brown, et al. (1951) 
who have pointed up the importance 
of distinguishing “between the em- 
pirical phenomenon of generalization 
on the one hand and theories or hy- 
potheses about generalization on the 
other.” The use of the terms in a 
hypothetical manner connotes non- 
observable instigative factors while 
the use of the terms in an empirical 
manner connotes nothing more than 
G as operationally de- 
fined above, Unless the term SG is 
Specifically otherwise qualified, we 
shall be referring to the empirical 
usage, 
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In the main, studies in this area 
use both the classical and the instru- 
mental conditioning paradigms. After 
initial conditioning, tests for gen- 
eralization are made by presenting 
stimuli which vary in their similarity 
to the conditioned stimulus (CS). 
Usually SG testing is continued over 
a series of extinction trials with some 
reinforced (booster) trials with the 

S interspersed to maintain response 
strength. Following the custom of 
the Russian and American literature, 
the original CS will be denoted Stim- 
ulus 0 and generalization stimuli 
(GS) as Stimulus 1, Stimulus 2, 
Stimulus 3, etc., in terms of their 
ordinal similarity to Stimulus 0. 


Spatial Generalization 


Studies by Bass and Hull (1934), 
and by Grant and Dittmer (1940), 
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made use of a tactual-vibratory con- 
ditioned stimulus applied directly to 
some point on S's body (Stimulus 0). 
Shock to the right wrist was used as 
the unconditioned stimulus (UCS); 
the galvanic skin response (GSR) 
served as the conditioned response 
(CR) and generalization response 
measure. Tests for generalization 
were made by applying the tactual- 
vibratory stimulus to points on S's 
body at varying distances from 
Stimulus 0. Thus, instead of varying 
the disparity between the CS and the 
UCS with changes in the physical 
Properties of the stimulus, they 
varied the point of application to the 
body. A typical situation in the Bass 
and Hull study had S’s left shoulder 
as Stimulus Point 0, while three 
other points 16 inches apart on the 
body were used as generalization test 


responses to these 
dasa function of the 


above were observed, 
Part of the Study, E t 


to one-fourth of 


tested 1, 2, and 3 for 
effects of this extincti 


the group that had the GSR 
ured on their right hand. 
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spaced their four tactual-vibratory 
stimulus points four inches apart on 
the back and one inch apart from the 
wrist to the index finger. The gradi- 
ents under these conditions were con- 
cave upward. 

In an attempt to test the applica- 
bility of SG findings of conditioning 
experiments to verbal learning situa- 
tions, Gibson (1939) carried out a 
study somewhat similar to the Bass 
and Hull investigation. Two groups 
were used, one having the tactual- 
vibratory stimuli placed 4 in. apart 
across their backs and the other hav- 
ing them placed 4 in. apart down 
their backs, forming two separate 
generalization continua. Ss were in- 
structed to indicate verbally when 
the tactual-vibrator was applied to 
Stimulus Point 0 and to inhibit 
verbal responses when stimulated at 
the other stimulus positions. The 
reaction time of the verbal response 
was measured by means of a voice 
key. For both the vertical and hori- 
zontal continua the GSGs of fre- 
quency of responses showed signifi- 
cant differences between Stimulus 
Points 0 and 3, 1 and 3, and 0 and 1. 
No significant gradient of reaction 
times was obtained. In fact, the 
average latency of response to the 
GSs was less than that to Stimulus 0. 
Gibson explained this as probably 
being due to the instructions inhibiting 
long-latency “false” responses to the 
GSs. Another interesting finding was 
an upturn at the end of the GSG 
when vibrators were distributed hor- 
izontally; in the horizontal position, 


gPoints 0 and 3 were bilaterally sym- 


metrical and this symmetry of loca- 
tion is offered as an “explanation” 
of the upturn. This explanation was 
also used by Anrep (1923) when his 
dogs showed similar responses to bi- 
lateral symmetrical stimulation; An- 
rep concluded that symmetry was 
an effective elicitor of generalization 
responses and that stimulation of 


the point bilaterally symmetrical 
with Stimulus 0 was as effective as 
stimulation of the 0 point itself. 

It might be maintained that this 
voluntary verbal response situation 
is too different from conditioning to 
be considered an instance of SG and 
that all Gibson observed was a failure 
to discriminate Stimulus 0 from the 
GSs. However, before training, she 
tested S's ability to discriminate be- 
tween the stimulus points. The 
group that discriminated better was 
the one which later generalized more, 
casting doubt on any “‘failure of dis- 
crimination” hypothesis. It could 
also be argued that the so-called gen- 
eralization responses were nothing 
more than chance errors which were 
due to hurrying the response. This 
argument might have some weight if 
it were found that speed of reaction 
and number of generalized responses 
were positively related. However, 
there was no such relationship. This 
same finding for a voluntary response 
SG study is reported elsewhere (Med- 
nick, 1955). In fact, the group that 
had slightly longer reaction times was 
the group that generalized more. In 
any case, Gibson’s operations fulfill 
our initial definition of SG. 

In an attempt to demonstrate 
spatial generalization in a sensory 
modality other than tactual, Brown, 
Bilodeau and Baron (1951) devised 
a procedure for testing for SG along 
a visual-spatial continuum. Their 
apparatus consists of a horizontal 
row of seven lamps and a reaction 
time key for S. S is told that this is 
a test of reaction time and that when 
the center lamp, Stimulus 0, is 
flashed, he should release the reac- 
tion key as quickly as he can. He is 
warned that the other lamps will also 
be flashed and that he should not re- 
spond to these peripheral lamps. 
After a series of training trials with 
Stimulus 0, the testing period begins. 
This period consists of test trials with 
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A 
the peripheral lamps „interspersed 
among booster trials with Stimulus 
0. In two separate experiments sym- 
metrical gradients were found. No 


Problem concerns the fact that in- 
structions rather than reinforcements 
Produce the major behavioral varia- 


eralization-like effects,” These “gen- 
eralization-like effects” 
commonly used by Brown and his 
associates) have Proven to be sur- 
prisingly parallel to the effects result- 
ing from classical conditioning-gen. 
eralization studies, 

Andreas (1954) pointed out that 


strength, but could also be ascribed to 


tendencies 
(here he is talking in nonempirical 


e tested to 
See if gradients could be obtained us- 


ing this type of apparatus without 


to condone implicitly but 
age false reactions,” Th 
vised a new Procedure usin 

asic apparatus 
this Problem, 


(20%). The center amp wins 
frequently than the T 


n 9 aS 
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lamps. He reports a regular, statisti- 
cally significant gradient of fre- 
quency of response. No regular la- 
tency gradient was obtained though 
latency measures were taken. 

The failure of studies involving 
the generalization of voluntary re- 
sponses to find latency gradients has 
been of some interest to researchers 
in this field. It has probably at- 
tracted such attention because of the 
relative ease with which latencies of 
voluntary responses may be meas- 
. ured (as compared with latencies of 

GSR responses). Gibson’s explana- 
tion of this failure has been generally 
accepted. As briefly pointed out 
above, she suggested that with in- 
hibitory instructions, potential long 
latency responses would be withheld 
if a GS were the test stimulus, but 
would be allowed to occur to the CS. 
On the basis of this, Rosenbaum 
(1953) suggested that the usual gradi- 
ent might be obtained with latency 
data if Ss were given a set to respond 
very quickly, thus counteracting the 
effect of the inhibitory instructions. 
Mednick (1958c) attempted to estab- 
lish such a set in his instructions to S 
and found that by only using the 
data of the 30 Ss with the fastest 
reaction times, he obtained a rela- 
tively consistent positively accel- 
erated gradient of latency. In con- 
trast, the gradient of the 30 slowest 
Ss was extremely inconsistent, This 
study was then replicated on a sep- 
arate group and the findings were 
consistent with the hypothesis that 
the faster Ss would demonstrate a 
“typical” latency gradient of gen- 
eralization. In a study of the GSG 
‘as a function of age in children, Med- 
nick and Lehtinen (1957), using this 
apparatus, predicted and found that 
younger children were less restrained 
by the inhibitory instructions and 
demonstrated a latency gradient. 
These two studies seem to support 
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Gibson’s interpretation of the failure 
to find latency GSGs. Other studies 
using the Brown, Bilodeau, and 
Baron apparatus, or some modifica- 
tion of it, have investigated the effect 
of level of manifest anxiety and ex- 
perimental naivete on SG (Mednick, 
1957) summation of GSGs (Bilodeau, 
Brown & Meryman, 1956), the cen- 
tral tendency effect on GSG (Ge- 
wirtz, Jones, & Waerneryd, 1956), 
the correlation of SG with ethno- 
centric attitudes (Arnhoff, 1956, 
1957), and intelligence (Arnhoff & 
Loy, 1957). Most of these studies 
will be discussed below under appro- 
priate headings. 

Conclusions: From these studies 
we might reasonably assert the fol- 
lowing with respect to SG along a 
spatial continuum: 

1. With human Ss, GSGs can be 
obtained for both voluntary and in- 
voluntary response dimensions with 
statistically significant gradients reg- 
ularly observed. 

2. In voluntary response situa- 
tions, with inhibitory instructions, 
using latency as the response meas- 
ure, GSGs have not been regularly 
observed. However, some recent 
studies have supported explanations 
of this failure and have reported 
gradients. 


Pitch Generalization 
In studies of pitch generalization, 


. some special problems present them- 


selves. If frequency is varied (keep- 
ing intensity constant) apparent 
loudness varies concomitantly. Thus, 
stimulus intensity dynamism may be- 
come a complicating factor. Sec- 
ondly, human Ss have had extensive 
experience in differentiating pitch 
so that simple use of cps to char- 
acterize points on a pitch similgrity 
continuum seems naive. There are’ 
points on the pitch similarity con- 
tinuum which are well separated in 
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terms of cps, but are actually difficult 
for Ss to discriminate, e.g., octaves. 

Hovland (1937a) handled these 
problems by establishing a jnd scale 
using four points separated by 25 
jnds as his test stimuli. He varied in- 
tensity in order to equate for appar- 
ent loudness. In this study (CR was 
GSR; UCS was electric shock), a 
regular GSG was observed with sig- 
nificant differences between test 
Points. Ina repetition of Hovland’s 
study, Littman (1949) used an im- 
Proved GSR apparatus, In the early 
test trials, generalization was com- 
plete (responses to test stimuli as 
great as responses to Stimulus 0) 
with a significant GSG only appear- 
ing in the fourth cycle of SG testing. 

ittman explains the relative irregu- 
larity and extreme height of his 
plotted gradients as Possibly being 
due to his Ss being less acclimated 
and less experienced in the GSR- 
shock situation than were Hovland’s 
Ss. In Support of Littman’s inter- 
pretation, Mednick (1957) has since 
shown that generalization responsive- 
ness is indeed greater for relatively 
naive Ss, 

In view of the complications jn- 
volved in testing in more than one 
octave, Humphreys (1939) ques- 


tioned the use of as broad a range of 


test stimuli (153-1967 cps) as em- 


Studies. He 


Those Ss recej ving 50% rei 55 
ment produced a GSG that i 
elevated, 


octave of Stimulus 9 had 
pectedly strong response eliciting 
power (octave effect), Ą study by 
Wickens, Schroder, and Snide (1954) 
also made use of the Hovland par- 


adigm except that only the three low- 
est tones were employed (153, 468 
and 1000 cps) and each S was tested 
on only one GS. Regular GSGs were 
observed. Le Ny (1957a, 1957b) em- 
ployed the same tones as Hovland 
but required a voluntary key-pressing 
response. He found regular gradients. 

Garmezy (1952) used two sets of 
five tones, one ranging from 500 to 
560 cps and the other ranging from 
700 to 760 cps. Ss were trained 
either on 515 or 715 cps and were 
tested with all the GSs in that set, 
The response consisted of pushing or 
pulling a handle. Conditions of rein- 
forcement were varied and smooth 
regular gradients were produced by 
all conditions of training. 

The octave effect was also reported 
by Blackwell and Schlosberg (1943) 
who studied rats running for food 
and away from shock in a straight 
alley maze. Ss Were trained to run to 
all stimuli (10, 8, 7, 5, and 3 ke.) and 


Sponse was produced wi 
at 5 kc., the octave of 10 ke, A sec- 
ond study aimed at minimizing the 
secondary harmonic effects at the 
octave produced an even more ap- 
parent octave effect, 

Ina study of the effect of y 
attitudes upon SG 
made use of a 


es F, B, e, a, 
i While attitude 
(cooperative, noncooperative) af- 
fected conditioning, it di i 


ence Seneralization, 


pitches, 
data indicate that 


Serye anything other than a very 
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rough two step gradient. However, 
inasmuch as attitude had no effect 
on generalization, we are able to 
combine the data for all the attitude 
groups for each of the generalization 
stimuli. Thus, we may reprocess the 
reported data to plot a GSG along a 
continuum of unusual breadth, al- 
most four octaves. The resultant 
gradient is very regular and steeply 
descending. . 

Jenkins and Harrison (1958) made 
use of an operant conditioning 
method for studying SG in pigeons 
which was suggested by Guttman 
and Kalish (1956). They conducted 
two experiments in pitch generaliza- 
tion on the effect of differential train- 
ing (CR extinguished to absence of 
CS) on SG. They found that differ- 
ential training produced steeper gra- 
dients than nondifferential training. 

The following may be reasonably 
asserted regarding SG along a pitch 
continuum: 

1. Regular and*reliable gradients 
are observed. 

2. Augmented responsiveness to 
the octave of Stimulus 0 has been 
noted. 

3. As with some other continua, 
initial generalization test trials often 
elicit augmented SG responsiveness. 
This is often observed in studies 
utilizing a noxious UCS. It would be 
desirable for such studies to institute 
a pseudoconditioning control to see 
what role sensitivization is playing 
in this situation. 

4. Humphrey suggests that in view 
of the complications inherent in 
working with pitch SG (particularly 
the octave effect), future research 
work should consider such continua 
as hue. However, in view of the 
quirks of the AX/\ incremental hue 
discrimination curve, nonmonotonic 
gradients are not to be unexpected in 
studies of this continuum also. 


Intensity Generalization 


In a theoretical article, Hull (1949) 
discusses an aspect of intensity gen- 
eralization studies which should be 
initially clarified; other things re- 
maining constant, an organism will 
react with greater amplitude, fre- 
quency, and speed to a stimulus of 
greater intensity. Since all studies of 
intensity generalization use training 
stimuli of different intensity from the 
generalization test stimuli, according 
to Hull’s postulate, tests for general- 
ization should show strength of re- 
activity to be a joint function of SG 
and stimulus intensity. Hull predicts 
different forms of observed gradients 
for test stimulus intensities ranging 
above and below the CS. He predicts 
convex upward gradients when test 
stimulus intensities range above the 
CS, and concave upward gradients 
when test intensities range below the 
CS. His postulate also leads to the 
prediction that the general trend of 
the gradient originating at a rela- 
tively strong stimulus intensity and 
generalizing toward weaker stimulus 
intensities will have a greater down- 
ward slope than will the gradient ex- 
tending in the opposite direction be- 
tween the same stimulus intensities. 

The fact that the shape of a curve 
may be modified by scale transforma- 
tion mitigates the importance of 
Hull’s notions unless there is some 
specification of the stimulus and re- 
sponse units. This question is dis- 
cussed again in the section on the 
shape of the gradient. 

Intensity of direct light source. Hull 
finds confirmation for his postulate 
of the joint action of SG and stimulus 
intensity in a study by Brown (1942). 
This study may be considered with a 
group of experiments in which Ss 
were trained to respond to a certain 
intensity of direct light (as opposed 
to intensity of reflected light) and 
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tested with other intensities. Brown’s 
study tested the adient pull of rats to 
the stimulus of a screen of 0.2, 5, or 
5000 apparent foot-candles illumina- 
tion which, in turn, was associated 
with food. In accordance with Hull’s 
prediction, the GSG (of a maze-run- 
ning, harness-pulling response) was 
steeper for “high intensity trained” 
animals (trained on a high intensity 
stimulus and tested for SG on lower 
intensities, CS being the highest 
stimulus) than for “low intensity 


est stimulus). These terms, “high 
intensity trained” and “low inten- 
sity trained” will be used henceforth 


for high intensity 
trained rats than for those low in- 


gradient. 
In an experiment with Preschool 
children, Spiker (1956b) di 
tially reinforced a white 
and a blue (negative) light f 
pressing response. € then tested 
for generalization with three addi- 
tional white lights; half of the Ss were 
high intensity trained and half were 
low intensity trained. The gradient 
for the high intensity trained was, in 
accordance with Hull’s Prediction, 


significantly steeper than the gradi- 
ent for low intensity trained. 
These studies, varying intensity of 


intensity dynamism. 

Intensity of reflected light source. 
Another 8roup of studies yielding re- 
sults somewhat incompatible with 
those reported above, used intensity 
of reflected light as the generalization 
continuum. Raben (1949) trained 
rats to run in the brightest of five 
Painted runways (varying in steps of 
0.33 log units) and then tested for 
GSG of response latency jn subse- 
quent test sessions, She noted that 
the GSG went from convex upward 


Progressive decrement in SG re- 
Sponsivenegs, Grandine and Harlow 


decreasing GSG on the first trial: 
authors state that the obtained GSGs 


did not differ significantly from a 
Straight line, 


Montgomery (1953), 


trained 
ack—white dis- 


acy 


$ 
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crimination and then tested gen- 
eralization to five degrees of relative 
brightness. They found a straight 
line gradient. 

In an ingenious adaptation of the 
Brown, Clarke, and Stein (1958) 
method, Bass (1958) used variations 
in the grayness of the projected sil- 
houette of a jockey on a running 
horse as her SG continuum. Every 
exposure of the silhouette represented 
another race; S was required to guess 
whether the horse would win. The 
Stimulus 0 horse won 80% of its 
races, while the other three horses 
(Stimuli 1, 2 and 3) won 40% of their 
races. It was assumed and found that 
the tendency to guess “win” would 
generalize from Stimulus 0 to the 
GSs as a function of their distance 
from Stimulus 0 on the grayness 
scale. This method could be readily 
adapted for use with children. 

In general, studies using the in- 
tensity of direct light source as a 
continuum tend to support Hull’s 
stimulus intensity dynamism princi- 
ple; studies using reflected light tend 
to show ambiguous or negative evi- 
dence. Inspection of the two groups 
of studies suggests that the varia- 
tions in reflected light (in the range 
of intensities thus far employed) 
have not produced as compelling a 
difference in intensity as variations 
in the intensity of a direct light 


source. 


Intensity of sound. Hovland 
(1937b) tested for generalization of a 
CR to varying degrees of intensity 
of a tone; CS frequency was 1000 cps, 
the GSs were each 50 jnds apart and 
placed 40, 60, 74, and 86 dbs. above 
the threshold. Human Ss were used 
with PGR as CR and electric shock 
as UCS. Hovland trained half of the 
Ss to low intensity, half to high, with 


.a counterbalanced design. While he 


does not plot differential gradients 
for low intensity trained as opposed 
to high intensity trained, it is appar- 
ent from inspection of the reported 
data that the slope of the gradient 
for the high intensity trained group 
is greater than that for low intensity 
trained. Fink and Patton (1953), 
studying learned drinking responses 
in white rats, found that the GSG 
for high intensity trained Ss tended 
to have steeper slope than that for 
low intensity trained Ss. 

Miller and Green (1954) tested 
for generalization of extinction effects 
along a continuum of buzzer inten- 
sity. Using rats motivated by shock 
to run to a buzzer signal along a T 
maze, they reported a GSG with 
greater slope for high intensity 
trained animals. The gradient of ex- 
tinction trials for the low intensity 
trained group approached a hori- 
zontal line, which result is explained 
in part by the greater number of 
trials which this group took to learn 
as compared with the high group. 

From studies of sound intensities, 
the theory of SG responsiveness as a 
joint function of SG and stimulus in- 
tensity is supported. 

Drive intensity SG. Stimulated by 
some studies showing indirect evi- 
dence of the generalization of re- 
sponses to proprioceptive stimula- 
tion associated with specific dura- 
tions of food deprivation, Yama- 
guchi (1952) explicitly designed a 
factorial study to test for this phe- 
nomenon. He deprived rats of food 
for from 3 to 72 hrs. during training 
and tested for generalization along 
the continuum of proprioceptive 
stimulation resulting from different 
degrees of hunger at each depriva- 
tion level. He found significant 
GSGs; the slope of the GSG of the 
high intensity trained group was sig- 
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nificantly greater than the slope of to higher stimulus intensities, This F. 
the GSG for those that were low in- interpretation may also explain the - 
tensity trained, What is of special ambiguity of the findings in the re- 
interest is the fact that generaliza- flected light intensity studies, 
tion could be measured along this Hull (1950b, 1951, 1952) himself, 
unusual continuum, has presented a revision of his orig- | 
Some new interpretations. Johns- inal ideas on stimulus intensity dyna- 
gard (1957), following up some work mism. This revision has received à 
by Bragiel and Perkins (1954), has little research attention. The reyi- 
Presented evidence Suggesting that it sion suggests that intensity enters as 
is the degree of stimulus-background a variable both during training (af- 
contrast rather than absolute stim- fecting sHp) and during response 
ulus intensity which is the variable evocation. (In the earlier Paper no 
Producing the stimulus intensity effect on sp was postulated.) This 
dynamism effect. Johnsgard had revision actually has only a minor 
Tats run on an elevated runway to effect on his earlier predictions, Hey- 
stimulus cards of varying reflectance man (1957) in an excellently con- 
Placed before a Screen of relatively ceived study, evaluated the revised 
medium intensity, He made more position, He used high intensity and t 
low intensity trained rats approach- H 
Sponses and of latency from consid- ing and pushing open a door Covered 


eration of the contrast of stimulus with grey stimulus Paper of appropri- 
card and background than from the ate reflectance 


* He found regular 
absolute Intensity of the Stimulus GSGs with slope and shape in agree- » 3 
card, i ment with Hull’s Predictions, He did 

Perkins (1953) has also suggested a not find that habi 


t strength (sp) 
Was affected by the intensity of the 
[ c training stimulus, His results sug- 
stimulus Jntensity, He maintains Sest that Hull's earlier Position jg 


that during the conditioning Pro- better able to predict his results than 


up to the absence of the CS 


> abse 3, 1.€., re- Conclusions. The following con- 
sponse 1s inhibited at those times that clusions may be reasonably asserted 
intensity equals zero. The in- regarding SG along an intensity 
1Ditory GSG originating at the Point continuum: 
of zero Intensity dampens response tt, Statistically Significant gradi- 


ents are regularly observed, 
e further points 2 


ae F agreement with Heyman 
57), that in the ty. i i ities į 
1 : with GSs of ] i 
mng, direct light-intensit ‘Ss or ts the os 
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results in accordance with Conclu- 
sion 2. 

4. Some recent work suggests that 
stimulus background contrast is an 
important variable determining the 
effective intensity of the stimulus. 


Size Generalization 


A number of experiments have in- 
vestigated generalization along the 
dimension of stimulus size. The 
basic paradigm is similar to that of 
other SG studies; S is trained to re- 
spond to a CS of a given size and is 
then tested for generalization to 
stimuli above and/or below the CS 
on the particular size dimension be- 
ing employed. According to most 
theoretical formulations, the amount 
of generalizations should be inversely 
proportional to the distance of the 
given stimulus from the CS. Al- 
though this expectation is, to a large 
extent, supported by the data, there 
are various factors which complicate 
the situation and which make it diffi- 
cult to obtain regular gradients. 

The two earliest studies were con- 
ducted by Gulliksen and by McKin- 
ney. Gulliksen (1932) used size and 
direction of angles as the stimuli and 
the moving of a handle to the left or 
right as the response. A gradient 
along the size dimension was re- 
ported. McKinney (1933) employed 
geometric line drawings that varied 
in the total length of the lines used. 
The results show ‘‘that the percent- 
age of transfer decreases as the 
amount of alteration increases” (p. 
857). 

More recently, a series of experi- 
ments has been conducted employing 
circles of varying areas as stimuli. 
All of these studies used an apparatus 
designed by Grice, consisting of a 
straight alley maze 24 in. long with a 
starting box at one end and the 


stimulus at the other end. The 
stimulus circle has a small door 
through which a rat may reach the 
feeding dish. The maze can be ro- 
tated so that S does not have to be 
replaced at the end of a trial. 

In the initial study by Grice and 
Saltz (1950), five white circles (79, 
63, 50, 32, and 20 sq. in.) were used. 
Ss were trained either to the largest 
or smallest circle, and were tested on 
all stimuli in an extinction situation. 
Bidirectional GSGs were obtained. 

Kling (1952) investigated gen- 
eralization of extinction using the 
Grice apparatus. Ss were trained to 
respond to two different stimuli 
which were presented singly. One 
of the two stimuli was then presented 
alone for 15 nonreinforced trials. 
Testing consisted of a presentation 
of the other stimulus. The measure 
of generalization was the latency of 
the first response to the test stimulus. 
Smooth gradients were obtained for 
generalization from large to smail 
and from small to large. 

Margolius (1955) repeated Grice’s 
experiment varying the number of re- 
inforced trials during the training 
periods, and employing only the 79 
sq. in. stimulus as the CS. Using the 
mean latency of the first three test 
trials as a measure, smooth gradients 
were obtained. Two other measures 
—number of responses in 30 trials 
and number of responses in 60 sec.— 
also produced regular gradients 
which weren’t as precise as the la- 
tency gradient. 

A study by Brush, Bush, Jenkins 
and Whiting (1952) is quite similar 
to the three just discussed. With 
pigeons as Ss, an illuminated spot of 
varying size on a pecking target in a 
Skinner box was used as the stimulus. 
Generalization was tested on both 
sides of the CS; rate of response was 
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the response measure, Asymmetrical 
gradients were produced, that on the 
larger side being higher. However, if 
the log of size was substituted for 
absolute size, the gradients become 
more symmetrical, Jenkins, Pascal, 
and Walker (1958) also used this 
Operant technique and report smooth 
and fairly regular GSGs. 

Quite a few studies have investi- 
gated SG using the height of the 
stimulus as the independent variable. 

randine and Harlow (1948) trained 
monkeys to find food in a dish under 
a block of a given height. The Ss 
were then presented with the CS and 
a stimulus of a different height. The 
choice of the nonrewarded stimulus 
Was considered an SG response, 
First trial data yielded regular GSGs. 

A study by Buss (1950) followed 
One of the basic concept formation 
Paradigms, 
blocks of various heights, colors, 


were reinforced. Generalization gra- 
were produced along the 
height dimension both by number 
and latency of the “vec” responses. 
This is one of the few experiments in 
this area, or in fact in the whole area 
of stimulus generalization, in which 
human Ss, making a voluntary re- 
sponse, produced a gradient based 


O positive re- 
sponses (i.e., lifting either key), 
rather than a single positive response 
and inhibiting instructions as in 
comparable studies. Two other 
studies (Buss, Weiner & Buss, 1954; 
Buss, 1955) making use of the same 
basic method and varying mode of 


reinforcement also produced smooth 
GSGs. 

Rosenbaum (1953) using psychia- 
tric and normal Ss, employed rec- 
tangles of light of various heights as 
the stimuli. The response consisted 
of moving a slider along a wire. 
Strong and weak shock as well as a 
buzzer were used to increase the 
speed of the response and to heighten 
anxiety. Regular gradients of ampli- 
tude were reported for all conditions 
except for the group of normal Ss 
that received the buzzer. A nonsig- 
nificant reverse gradient of latency 
was found, Le, latency varied in- 
versely as a function of GS dissimilar- 
ity. Unfortunately, during the test- 
ing session three CS booster trials 
were regularly presented between 
test stimuli. This would tend to pre- 
pare SS for the test stimulus and con- 


the experiment is difficult 


In a study by Botwinick (1953), Ss 
were trained to a sm 


ported for frequenc 
Using all trials 
for latency, 
measures, 


Grant and Schiller Conducted a 
study with the GSR as the response 
measure, and shock as the UCS, A 


ut a reversal oc 
the largest stimulus (1953), 


n general, regular Sradients have 
een observed for Seneralization 


ee ar 
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along the size dimension. In some 
cases, however, reversals and asym- 
metries have been observed. These 
have uniformly occurred in cases 
where Ss have been trained to a rela- 
tively small CS and tested on larger 
GSs. It seems likely, as pointed out 
by Grice and Saltz, that this is due 
to a general effect which tends to aug- 
ment responsiveness to larger stim- 
uli. In view of the fact that large 
stimuli appear brighter than small 
stimuli, the effect may be indirectly 
due to stimulus intensity dynamism. 

An alternative or complementary 
explanation could be based on the 
problem of defining equal distances 
between stimuli. If equal physical 
distances are used, it is evident that 
the psychological distances between 
the smaller stimuli are greater than 
those between the larger stimuli. 
This would make the larger stimuli 
closer to each other and to the CS 
than are the smaller stimuli, produc- 
ing an elevated gradient. Brush et al. 
(1952), have demonstrated that us- 
ing logarithmic scales tends to com- 
pensate for this effect. 

Conclusions. From these studies, 
we can reasonably assert the follow- 
ing with respect to SG along the 
dimension of size: 

1. For both human and infra- 
human Ss, for both voluntary and 
involuntary response measures, for 
measures based on frequency, ampli- 
tude and in some instances latency 
of response, regular GSGs are ob- 
served. 

2. The use of a slider response for 
human Ss, such as used by Gulliksen, 
Rosenbaum, and Botwinick, provides 
excellent amplitude and frequency 
data. However, latency GSGs are 
not observed. 

3. A group of studies from widely 
separate laboratories which made use 


of the Grice apparatus have obtained 
highly consonant and meaningful 
results. This recommends this 
method quite highly for animal work. 


Drive Stimulus Generalization 


In this area we ask: (a) whether a 
response can be trained to the pro- 
prioceptive stimulation arising from 
a deprivation state; and (b) if so, 
will generalization of this response 
occur to proprioceptive stimulation 
which arises from similar depriva- 
tion states. In an attempt to answer 
these questions, Yamaguchi (1952) 
used a modified factorial design, 
Skinner box bar pressing response, 
and varying degrees of internal re- 
sponse to food deprivation as stimuli, 
with rats as Ss. He found generaliza- 
tion along this drive stimulus con- 
tinuum. : 


Temporal Generalization 


Pavlov (1927) demonstrated that 
dogs could learn to respond to a time 
interval as a conditioned stimulus. 
Several studies have subsequently 
investigated SG along this dimen- 
sion. Czehura trained four dogs to 
respond to a one second silent inter- 
val between two tones differing in 
pitch. He then tested responsiveness 
to three additional durations of silent 
interval—3, 9, and 26 sec.—observ- 
ing a regular gradient. 

Rosenbaum (1951) trained rats to 
press a bar every 60 sec. to obtain 
food pellets. During testing, seven 
additional intervals were used—14, 
30, 45, 75, 90, 105, and 120 sec. Us- 
ing a latency measure of response 
strength, a significant gradient, de- 
creasing in height as the CS was ap- 
proached, was obtained; past the 60 
sec., Stimulus 0 point, latencies 
tended to remain at the same level 
as those for the CS interval. The 
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lack of a regular gradient beyond the 
CS interval was interpreted as the 
result of S being set to respond at 60 
sec., at which point internal stimulus 
cues would be maximal. Therefore, 
for intervals greater than 60 sec. re- 
sponse latencies would not be ex- 
pected to increase. A question which 
can be raised regarding this study is 
the positioning of the animal. As 
the testing interval approached the 
CS interval, the animal could have 
been moving closer to the bar, so 
that proximity would result in de- 
creased latency. After 60 sec. the 
animal might have remained poised 
at the bar resulting in the observed 
plateau. 

A further study of the temporal 
dimension was that by Fink and 
Davis (1951) with human Ss giving a 
voluntary key-pressing response to 
three series of tones of differing dura- 
tion. Muscle action potential data 
was also recorded. In one series, Ss 
were told to respond to all tones and 
were presented with tones of 2, 2.25, 
2.5, 3.0 and 4.0 sec. In the second 
series (both given to all Ss) only two- 
second tones were presented. During 
testing, 10 trials were given with the 
two-second durations, then 10 trials 
with the other four durations, Fairly 
regular convex upward gradients of 
both muscle action potential and key 
Pressing were obtained. It is of in- 
terest in this study that the key 
pressing response yielded results 
which were directly comparable to 
the results obtained from the muscle 
action potential response, These re- 
sults on muscle action potential in a 
sense support the use of voluntary 
response measures with human Ss. 
Fink and Davis discuss the problem 
of choosing an interval scale that js 
most appropriate for the independent 
variable. They suggest that any 


measure is as good as another if fol- 
lowed consistently; logs are of par- 
ticular value for human Ss when the 
temporal intervals used are very 
small. 

A final study in this area is that 
by Mednick (1959), who used taped 
tones differing only in duration (CS 
=2.7 sec., GSs=3, 3.3, 3.6, 3.9 sec.) 
Twenty CS training trials were 
given and a regular generalization 
gradient of a voluntary key press- 
ing response was found during test- 
ing. 

It may be concluded that regular, 
statistically significant GSGs may 
be obtained along a temporal dimen- 
sion. It would seem to be advanta- 
geous for Es to adopt a consistent 
method of presenting gradients in 
terms of the interval scale express- 
ing the independent variable. Log 


units appear to be the most generally 
applicable. 


Hue Generalization 


Utilizing a modified version of the 
Skinner automatic key-pecking pi- 
geon apparatus, Guttman and 
Kalish, are conducting a series of 
careful investigations of SG along the 
hue continuum, The stimulus is pro- 
duced by a monochromator and dis- 
played on the translucent pecking 
key. Typically, the Pigeon is brought 
to a high level of responsiveness by 
means of a variable interval rein- 
forcement schedule and generaliza- 
tion testing is carried on under ex- 
tinction conditions. Regular gradi- 
ents are almost invariably obtained 
with this technique. It also has the 
advantage of enabling E to obtain 
GSGs from individual Ss. 

In an early experiment Guttman 
and Kalish (1956) studied the rela- 
tionship between the degree of dis- 
criminability of changes in a stimulus 
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continuum and the shape of the gen- 
eralization gradient along that con- 
tinuum. One might predict an in- 
verse relationship between the incre- 
mental discriminability of a con- 
tinuum (as measured by AX) and 
the amount of generalization. How- 
ever, these variables were found to 
be relatively independent of each 
other. Kalish (1958) adapted this 
apparatus for humans and found an 
inverse relationship between gen- 
eralization and discriminability along 
the wave length continuum. The 
differences in procedure and Ss be- 
tween the Guttman and Kalish 
(1956) and Kalish (1958) studies 
makes it difficult to explain these op- 
posing results. 

In a later study on summation of 
gradients (Kalish & Guttman, 1957) 
the method proved successful in pro- 
ducing regular GSGs along the hue 
continuum. An outstanding doctoral 
dissertation by Honig (1958), using 
the same basic procedure investigated 
problems in transposition and stimu- 
lus preference. Of special interest is 
the finding that stimulus preference 
may be predicted from knowledge of 
the underlying generalization gradi- 
ents. Hanson (1957) had pigeons 
learn to peck at a key for one wave- 
length and not to peck for another. 
The pigeons were then tested for gen- 
eralization along the wave length 
continuum that surrounded these 
two stimuli. As might be expected, 
generalization responsiveness on the 
side of the negative stimulus was 
greatly reduced. What is surprising 
is that the discrimination training did 
not reduce the total amount of gen- 
eralization responsiveness. Instead 
the entire GSG seemed to move up 
the continuum away from the nega- 
tive trained stimulus. The maximum 
point of the curve was also displaced 


in this direction. In an informative 
paper on the problem of the scaling 
of the independent variable, Gutt- 
man (1956) reports on some current 
work from his laboratory. 


VARIABLES INFLUENCING 
GENERALIZATION 


Stimulus Generalization as a Function 
of Drive Level 


Studies in this area have concerned 
themselves with the effect of hunger, 
sex, and fear-anxiety on the GSG. 

Hunger. Brown (1942), using the 
straight alley maze, rat-pulling-har- 
ness apparatus, varied the number of 
hours of pretest food deprivation. He 
noted regular gradients with the 
amplitude and speed of response 
varying directly with number of 
hours of food deprivation. The GSG 
of the high drive group also tended 
to be steeper. In a later study of 
similar design (Brown, 1948), these 
findings were replicated. An addi- 
tional condition of strong electric 
shock produced results which sug- 
gest that the gradient of avoidance 
(shock) behavior is steeper than that 
for approach (hunger) behavior. 

Rosenbaum (1951), studying gen- 
eralization along a temporal dimen- 
sion, using a modified Skinner box, 
varied drive by depriving two groups 
of rats for 22 hours and then feeding 
one group a small quantity of food 
just before testing. He noted no sig- 
nificant differences between the GSGs 
of his two groups when the fed group 
received only 4 gms. of food. When 
this amount was increased to 10 gms. 
in a second experiment, the differ- 
ence between the groups was signifi- 
cant. The unfed group demonstrated 
greater SG responsiveness. 

The more pigeons are deprived of 
food the more generalization re- 
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sponsiveness they exhibit (Jenkins, and Walton (1938) obtained similar | 
Pascal, & Walker, 1958). The pi- results. . : 

geons were tested using operant tech- Situational factor. Med nick (1 957) j 
niques with the size of the pecking compared the GSGs of experimen- l 
key serving as the generalization tally naive and experimentally so- } 


continuum, Phisticated Ss finding that the naive 
None of these studies utilized a Ss demonstrated greater SG re. 
factorial design; this factor would Sponsiveness than the sophisticated P 
tend to complicate the interpreta- Ss, Inasmuch as this effect was espe 
tion of their findings due to the possi- cially marked in Ss who scored high 
bility of a drive stimulus SG effect, on the Taylor Manifest Anxiety | 
spite this, their results make it Scale he interpreted this finding in 


reasonable to conclude that increased terms of Situational 
time of food deprivation will lead to It may be reasonably concluded 
increased SG responsiveness, from the above evidence that any 

Electric Shock. Rosenbaum (1953) condition which will increase the 
used three degrees of noxious stimu- drive state will result in increased 
lation (strong shock, weak shock, and SG responsiveness, 


drive state with human Ss. While Individual Differences in Generaliza- 
the weak shock and buzzer conditions ton 


did not Produce differing results, the The study of individual differences 


1S concerned with a search for cor- 
relates of SG reactiveness, The bulk 


f 

mage. å 
IS utiliza- } 
y 


Ps One of the series of studies on of the work has dealt with develop- 
isplacement, Murray and Miller mental factors, Personality Variables 
(1952) compared the GSG under schizophrenia, and brain damage, In 
three levels of electric Shock finding Most instances the intent 
greater generalization responsiveness tion of SG as an explanatory Concept r 
for the greater shock Condition, T US an excess in SG Teactiveness jg 
Brown (1948) reported a Similar seen by some Writers as an important 
nding, Relatively Strong shock asıs of the isorder of thinking in 
Produced relatively elevated gradi- sc izophrenia. 
ents ; owever, there is another possible É 
Sex. Testing the effects of tes- direction that this research might : 
tosterone Proprionate upon the copu- take. 


- Basic roble in learnj 
latory behavior of Sexually j i- p “ae fe aning 


form well in such a Context. For ex- 
of testosterone Proprionate the less ample, Gibson (1940) hypothesjz d 
the incentive need resemble a recep- that is i 5 


see 


STIMULUS GENERALIZATION 185 


should learn a high intralist sim- 
ilarity list with little interference 
(relative to its performance on a low 
intralist similarity list). Mednick 
(1955) has shown that individuals 
with cortical brain damage evidence 
reduced SG reactiveness. On the 
basis of this result, Carson (1958) 
hypothesized and demonstrated that 
cortically brain damaged individuals 
showed relatively little decrement 
in learning as a function of increas- 
ing intralist similarity. In studies of 
this type, it is necessary that the ef- 
fect be shown in the interaction term 
since simple differences between 
groups could be due to a myriad of 
other factors. 

Manifest anxiety. Mednick (1957), 
comparing high and low scoring col- 
lege students on the Taylor Manifest 
Anxiety Scale (MAS) found that the 
highly anxious Ss showed more spa- 
tial generalization. Studies by Hil- 
gard, Jones, and Kaplan (1951) on 
eyelid conditioning, Wenar (1953) 
on temporal conditioning, and M. T. 
Mednick (1957) on PGR condition- 
ing and mediated generalization, have 
also demonstrated greater gener- 
alization reactiveness on the part of 
high anxious Ss. In the study by 
M. T. Mednick it was found that 
conditioning level of the anxiety 
groups was directly related to degree 
of anxiety. In order to evaluate gen- 
eralization responsiveness independ- 
ent of conditioning level, she com- 
pared anxiety groups equated for 
conditioning level; the nature of her 
findings was not changed. It is im- 
portant that all studies attempting 
to demonstrate group differences in 
SG responsiveness either equate their 
groups for conditioning level or use 
a measure of relative generalization 
(ratio of generalization response to 
conditioned response). 


A study by Buss (1955), on psy- 
chiatric patients found no differences 
as a function of MAS scores; he sug- 
gests that the MAS is not an appro- 
priate test for psychiatric patients. 
Fager and Knopf (1958) agree with 
Buss’ conclusion on the basis of 
their experiment with psychiatric 
patients. They found no relation- 
ship between MAS scores and SG. 

Clinical groups. On the basis of 
clinical observation, Schilder (1939) 
first remarked on the difficulties 
schizophrenics have in tasks involv- 
ing differentiation. Bender and 
Schilder (1930), studying conditioned 
withdrawal from shock, noted ex- 
treme over-generalization by schizo- 
phrenic Ss. Cameron (1951) has 
spoken of over-inclusion and the 
broadening of the GSG on the part 
of schizophrenics. Garmezy (1952), 
studying generalization along the 
dimension of pitch, found that schiz- 
ophrenics demonstrated more rela- 
tive generalization than normals. 
This effect was especially marked 
under conditions of socially admin- 
istered punishment. Mednick (1955), 
using the dimension of space, found 
that schizophrenics generalized more 
than normals. Dunn (1954) tested 
schizophrenics with social and non- 
social materials and found relatively 
greater generalization for schizo- 
phrenics with the social materials. 
The “social” generalization continua 
were very cleverly constructed, in 
one instance, consisting of changes 
in the angle of elevation of the 
pointing arm of a scolding mother. 
Rosenbaum (1953) utilized psychi- 
atric patients in a generalization 
study, but did not compare them 
with the normal Ss in the study. 
However, inspection of the reported 
data suggests that there was little 
apparent difference in generaliza- 
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tion between the groups. Mednick 
(1958a) has presented a learning the- 
ory approach to research in schizo- 
phrenia which leans heavily on SG 
theory. 

Eriksen (1954) compared size gen- 
eralization in “hysterics” and 
“Dsychasthenics” as defined by ex- 
treme scores on the appropriate 
MMPI scales. The “hysterics” dem- 
onstrated more generalization. Both 
groups show more generalization 
when instructed that they could avoid 
the UCS (electric shock) than when 
told it was unavoidable. 

Personality tests, Arnhoff ( 1956) re- 
Ported that high generalizers showed 
more ethnic prejudice (E scale), but 
on replication of his study (1957) he 
found the difference did not hold up. 
In this laboratory, similar fluctuating 
Tesults have been observed across a 
group of four studies using the highly 
related F scale (authoritarianism), 
Analysis of the subscales of the F 
scale indicates that the Projectivity 
subscale consistently holds up. High 
“projectors” tend significantly, to be 
high generalizers, In the process of 
carrying out generalization research 
at this laboratory, various additional 
Personality measures have been in- 
cidentally correlated with generaliza- 
tion responsiveness. We will men- 
tion these briefly. No relationship 

as been noted between SG respon- 
siveness and field independence (Wit- 
kin, 1954) or need achievement (Mc- 
Clelland, Atkinson, Clarke, & Lowell, 
1953); on the other hand, those low 


Age. Mednick and Lehtinen (1957) 
found that SG responsiveness varied 
inversely as a function of age in 


school children; no relationship was 
found for adults in a separate study 
by Mednick (1955). With the 
younger children, Mednick and Leh- 
tinen noted a latency gradient while 
none was noted for the older chil- 
dren. In line with these results, 
Reiss (1946) found generalization 
along a gradient of homonymy to 
vary inversely as a function of age in 
children. 

Brain damage. In a study of pa- 
tients suffering from a variety of 
types and locations of cortical brain 
damage, Mednick (1955) noted ex- 
treme curtailing of generalization 
responsiveness. These results are 
similar to those of Smith (1951) who 
noted decreased transfer (based on 
deficiency in SG responsiveness) from 
Ss with corpus callosum damage. 
Goldstein’s (1941) concept of the 
“concreteness” of the brain damaged 
may also be based on some retarda- 
tion in SG responsiveness, A study 
by Mednick and Wild (1958) on 
cerebral palsied children indicates 
an accompanying deficiency in SG 
responsiveness, 


Degree and T Ype of Training 


SG has been regarded as a spread 


of habit strength from the CS to the 
GSs. Thus 


might influence the GSG i 
habit strength of the CR. 
ments in this area have manipulated 
either amount or distribution of rein- 
forcement of the CR. 

Number of reinforced trials. The 
earliest rele 
Pavlov’s laboratory 
by Razran (1949p), 
Ss, the salivary CR 
ment of stimuli as C 
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number of training trials to the CS. 

Beritoff (1924), experimenting with 
dogs, reported a “spread of gen- 
eralization,” that is, as training con- 
tinued, tones further and further re- 
moved from the CS could elicit the 
response. However, after three days 
of training generalization ceased to 
spread. There was considerable vari- 
ation among Ss; no statistical an- 
alysis was reported. 

In a study by Hovland (1937c), 
different groups were given 8, 16, 24, 
or 48 reinforcements. Generaliza- 
tion occurred at all levels of training; 
relative generalization (i.e. ratio of 
generalized response to conditioned 
response strength) increased with 
training and then decreased. Ss re- 
ceiving 16 reinforcements generalized 
not only more than those receiving 
less reinforcement, but also more 
than those receiving more reinforce- 
ment. It was also found that the 
more reinforcement Ss received, the 
more rapidly the generalized re- 
sponse extinguished. Hovland (1935) 
conducted a similar experiment, the 
results of which again indicate that 
relative generalization increases from 
8 to 16 reinforcements and then de- 
creases. 

Razran (1940) experimented with 
the conditioned salivary response in 
humans using the increase in weight 
of cotton dental pads placed in S’s 
mouth as the response measure. Con- 
ditioning consisted of short periods 
during which Ss ate while the appro- 
priate CS was presented. In the tests 
for generalization, which were held 
at the end of each of the eight train- 
ing sessions, different colored lights, 
buzzers and monosyllabic words were 
substituted for the CS, There was 
generalization to each stimulus at 
each degree of training. Relative 
generalization increased with train- 


ing and then declined; the decline 
was more marked when single stimuli 
were used as CS than when combina- 
tions of stimuli were used. The 
interpretation of the results of these 
experiments is somewhat difficult. 
For one thing, while Razran states 
that there were from 2 to 16 Ss in 
each study, 22 of the 26 studies had 
only 2 Ss. Further, when summariz- 
ing the various results he takes the 
mean of a group of percentages based 
on differing numbers of Ss. 

In a study by Spiker (1956a) chil- 
dren were trained to push a lever in 
order to get marbles which they 
could exchange for toys. Pushing 
the lever was rewarded with marbles 
only when a particular color of light 
was presented in contiguity with the 
response; each correct response was 
rewarded. Two groups of Ss, equated 
for response frequency, were given 
either 12 or 24 training trials. At the 
end of the training period generaliza- 
tion was tested with different colored 
lights. The group that received more 
reinforcement responded to the gen- 
eralization test stimulus more often. 

The experiments discussed so far 
dealt with the effect of amount of 
reinforcement on the amount of gen- 
eralization, the latter being tested 
with only a single GS. An excellent 
study by Margolius (1955) investi- 
gated the relationship between num- 
ber of reinforcements and the gradi- 
ent of generalization. Using the 
Grice apparatus, Ss (white rats) were 
trained on a 79 sq. cm. circle and 
tested on circles of 63, 50, 32, and 20 
sq. cm. Groups of Ss were given dif- 
ferent numbers of reinforcements 
ranging from 4 to 104. In testing for 
generalization, three measures were 
used: number of responses in 30 
trials, latency of the first three test 
trials, and number of responses in 60 
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sec. Absolute and relative generaliza- 
tion increased with training, the 
smoothest gradients being obtained 
from the latency measure. 

In the studies reported, SG re- 
sponsiveness increased as a function 
of the number of reinforced training 
trials. On the other hand, those 
studies using human Ss and PGR 
or salivary CR measures found rela- 
tive generalization increasing up toa 
point and then decreasing. Mar- 
golius studied rats running in the 
Grice maze apparatus and reported 
a direct relationship between rela- 
tive generalization responsiveness 
and number of reinforced training 
trials. These differences in results 
are ascribable to too many differ- 
ences in procedure and sampling to 
allow for unequivocal interpretation. 
However, there is a possible inter- 
pretation. The conditioned PGR is 
highly sensitive to the effects of 
adaptation, far more so than rein- 
forced maze running on the part of 
hungry rats. After 24 or 48 rein- 
forced trials, a gross overall adapta- 
tion of the PGR would not be unex- 
pected. This could easily have 
masked any possible advantage that 
the generalized response was gaining 
from increased training, 

Distribution of reinforcement. In a 
study by Humphreys (1939), the 
percentage of reinforcement of the 
CR was varied. With tones as 
stimuli, PGR as CR, Ss received 
either 100% or 50% reinforcement. 
All Ss received the same number of 
trials. The 100% reinforced group 
produced a negatively accelerated 
gradient; the 50% reinforced group 
responded almost equally to all 
stimuli, including the CS, though 
there was a slight decrease in re- 
sponsiveness to the stimulus farthest 
removed from the CS. Ina subse- 


quent repetition of this study which 
made use of a voluntary response 
Humphreys found essentially the 
same results (1948), Humphreys 
concluded that it is reasonable to 
predict a positively accelerated GSG 
for the 50% reinforced group when 
the gradient becomes more marked. 
This group also generalized more 
than the 100% reinforced group, 
despite the fact that the latter re- 
ceived twice as many reinforce- 
ments. In explanation, Humphreys 
points out that the 50% group was 
concerned about the shock during the 
generalization trials, while the other 
group was not. Another possible 
explanation of the results is that the 
extinction situation was more similar 
to the training situation for the 
partially reinforced group than it 
was for the 100% reinforced group 
(i.e. from 50% reinforcements to no 
reinforcements is Jess of 
than from 100% reinforce 
no reinforcements), 


a change 
ements to 


THEORIES or STIMULUS 
GENERALIZATION 

Neurophysiological Formulations 

While intuitively it would seem 
that SG is a phenomenon which af- 
fords fertile ground for 
couched in terms of neurophysiology, 
little attention has been devoted to 
Such attempts. Payloy (1927) first 
put forth the notion that SG is due 
to a fading wave of excitation irradi- 
ating from the Spot on the cortex 
stimulated because of the presenta- 
tion of the CS. As a result of this 
wave of excitation, stimuli whose 
cortical representations are contigu- 
ous to that of the CS also Possess the 
potentiality of eliciting the CR. Due 
to the fading strength of the wave, 
the response eliciting power of these 
stimuli decreases as a function of 


explanation 
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their distance from the cortical 
representation of the CS. 

Loucks (1933) and Denny-Brown 
(1932) have developed analytical 
arguments which seem to cast doubt 
on the “irradiation” hypothesis. 
Even more damaging, however, is 
the data from a study by Grant and 
Dittmer (1940). They point out that 
the cortical representation of the 
hand is considerably greater in area 
than that of the trunk of the body. 
From this, they infer that irradiation 
will suffer a greater decrement be- 
tween two comparable points on the 
hand than between two points on 
the trunk. Their data showed the 
gradient for the trunk to be slightly 
steeper. Grant and Dittmer also 
point to the Bass and Hull study as 
tending to refute the irradiation 
hypothesis. Bass and Hull (1934) 
report a smooth gradient going from 
shoulder to feet. In terms of irradia- 
tion theory this is extremely surpris- 
ing since the cortical representations 
of the thigh, calf and foot are inter- 
posed between the cortical repre- 
sentation of the waist and buttocks. 
Some aberrancy in the GSG should 
be observed as a result of this. While 
these data and analytic arguments 
cast strong doubt on the usefulness of 
the irradiation hypothesis as it 
stands, it certainly is appealing for 
heuristic reasons and may prove im- 
portant in some modified form. 

Wolpe (1952) has attacked the 
problem from the standpoint of ex- 
citation of common neural pathways. 
He gives an example from experi- 
ments on pitch generalization. It is 
known that tones which are similar 
to each other in terms of cps will 
tend to excite overlapping parts of 
the relevant end organ of hearing. 
Further, this end organ overlap is 
mirrored at higher neural levels such 


as the medial geniculate body and 
the cerebral cortex. Thus, a response 
learned to one of these tones may be 
called forth by the other tone by 
means of these common stimulated 
pathways. Tone A might stimulate 
Nerve Endings 1, 2, and 3 with a 
peak at 2. Tone B will stimulate 3, 
4, and 5, with a peak at 4. Any re- 
sponse called forth by Tone A will 
also be elicited by Tone B because of 
Common Pathway 3. A problem 
which quickly arises in this formula- 
tion is the usual finding of octave 
generalization, i.e., incremented re- 
sponse to the octave of the CS despite 
the fact that the tones are well sep- 
arated on the cps dimension. Wolpe 
explains this as being the result of 
overtones having the frequency of 
the CS being produced by presenta- 
tion of its octave. This explanation 
suffices for this aspect of the Hum- 
phreys (1939) experiment (which 
Wolpe cites by way of illustration) 
but does not explain the Blackwell 
and Schlosberg (1943) finding of a 
strong-octave effect in rats despite 
extensive precautions taken to min- 
imize the appearance of such over- 
tones. Wolpe would also have some 
difficulty explaining increments of 
generalization response to symmetri- 
cal stimuli (Gibson, 1939; Anrep, 
1923). Further, the Humphreys ex- 
periment, cited by Wolpe, itself con- 
tains data damaging to his theory. 
Humphreys’ experiment fully repli- 
cated an earlier experiment by Hov- 
land except for one important detail. 
Hovland tested across the pitch 
dimension using a range of 4 tones 
from 1967 to 153 cps. Humphreys 
only tested between 1967 and 1000 
cps. Since the representation of 1967 
and 153 cps are more widely sep- 
arated on the hearing end organ than 
1967 and 1000 cps, we are clearly in 
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a position to predict that in the in- 
stance that 1967 was the CS, the 
1000 cps test tone of the Humphreys 
study should elicit more relative gen- 
eralization than the 153 cps tone of 
the Hovland study. As it happens 
the relative generalization strength 
of Humphreys’ 1000 cps tone was 
-72; the relative generalization 
strength of Hovlands’ 153 cps tone 
was .70. Humphreys suggests that 
this similarity of response strengths 
might be ascribed to a “frame of ref- 
erence” effect since the tones were 
both end points of the pitch dimen- 
sionin their respective studies. These 
data and such an explanation are not 
compatible with the Wolpe position. 

The common pathways notion is 
similarly damaged by consideration 
of the dimension of intensity. Since 
degree of intensity of a stimulus is 
not reflected in the firing of a specific 
neurone it is not clear how common 
pathways can develop in the in- 
tensity dimension. 


Form of the GSG 


There have been a number of at- 
tempts to specify the form of the 
GSG. These attempts have often 
made the explicit or implicit assump- 
tion that a form exists which is in- 
variant across dimensions, In addi- 
tion, many have maintained, with 
Hovland, that “the form of the gradi- 
ent is of considerable importance in 
psychological theory” (1937b). Cer- 
tainly such work as Spence’s analysis 
of transposition, Miller’s displace- 
ment theory and work on summation 
of gradients could proceed from 
firmer foundations if the form or 
forms of the GSG were better de- 
lineated. 

Unfortunately there has not been 
good agreement regarding the shape. 
In fact, just about all reasonably 
imaginable shapes have been pro- 


posed as “the” form of the gradient 
(including no constant form at all). 

While this question of shape has 
consumed paragraphs in many a dis- 
cussion of experimental results, it 
has only occasionally assumed the 
status of a central problem. This is 
perhaps as it should be for the true 
shape may well be an ephemeral 
treasure. One dismaying notion js 
the ease with which any obtained 
shape may be altered by simply 
manipulating the units of the axes, 
It seems clear that without some 
specification of stimulus and response 
measurement scales, discussion of the 
shape of obtained gradients must 
proceed very cautiously, This fact 
has been pointed out by Guttman 
in an interesting treatment of this 
and other SG problems (1956). 

An approach to the problem of the 
shape of the GSG in terms of a 
mathematical model is Presented by 


Shepard (1958a) and discussed þe- 
low. 


Mathematical Formulation of SG 


The application of mathematical 
models to SG has been impeded by 
the absence of a Satisfactory inde- 
e ie, a mathemati- 
cally precise measure of psychologi- 
cal Similarity of stimuli, Recently, 


two kinds 


ler (1951) is based upon a set-theo- 
Tetic model, and views a stimulus 
as a collection of elements. The 
Similarity of two stimuli is defined in 
terms of the fraction of these ele- 
ments that the two stimulus-sets 
have in common, 

Following Estes (1950) 


cout ty 
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sponse. The probability that any 
new stimulus will elicit this response 
is proportional to the similarity of 
the new stimulus to the conditioned 
stimulus. An index of similarity is 
determined by application of a set- 
theory equation to previously ob- 
tained data. By introducing further 
assumptions about the effects of rein- 
forcement and extinction on the 
primitive elements, set-theoretic 
models of this kind have been shown 
to account for certain ways in which 
generalized response probabilities 
change during the course of dis- 
crimination learning (Green, 1958; 
Restle, 1955). However, Bush and 
Mosteller conclude that their model 
cannot make any experimentally 
testable prediction about the shape 
of the GSG since the index of sim- 
ilarity (that is presumed to govern 
SG) is ‘‘very much organism deter- 
mined” and, hence, has no invariant 
relation to the distance between 
stimuli along “such physical dimen- 
sions as light or sound intensity, fre- 
quency, etc.” Restle and Beecroft 
(1955) have pointed up the similarity 
‘between the Hull-Spence theory 
and the Bush-Mosteller mathemati- 
cal model in the study of stimulus 
generalization” as it is affected by 
differences in anxiety level. 

Shepard (1957) has proposed a 
model for SG that regards a stimulus 
as a point in metric space. The simi- 
larity of two stimuli is defined in 
terms of the distance between these 
stimulus points in their common 
space. He supposes that stimuli dif- 
fering along a single physical dimen- 
sion (eg. tones of various fre- 
quencies) can be identified with 
points in a “psychological space” in 
such a way that the distance between 
points is invariantly related to SG. 
Such a curve in psychological space 
differs from the rectilinear physical 


scale by some “continuous, differ-- 
entiable transformation.” Therefore, 
although the GSG is postulated to 
have an invariant form in this psy- 
chological space, when this space is 
retransformed back into the ‘‘phys- 
ical space” (e.g., into the physical 
scale of frequency) the form of the 
GSG becomes irregular and depends 
upon the position of the CS. 

He then asks this question: Given 
the physical scale with its empirically 
determined but irregular gradients of 
SG, is it possible to recover the psy- 
chological space and, hence, a unique 
GSG having the same form for all 
positions of the CS? Shepard’s con- 
tention is that for any given experi- 
mental procedure the GSG has a 
unique form that is independent of 
the position of the CS and of the 
physical dimension along which the 
test stimuli are arrayed. 

His procedure attempts to dis- 
cover the unique function that will 
convert the conditional response 
probabilities from experiments on 
SG into psychological distances. On 
the basis of several paired-associate 
experiments in which the stimuli 
vary along such physical dimensions 
as size, brightness, color and shape, 
he concludes that the GSG under 
this kind of experimental condition 
is concave upward (Shepard, 1958a, 
1958b). In addition, though, he con- 
cludes that the shape of the GSG de- 
pends upon the distribution of rein- 
forcement so that, if the S is rein- 
forced only part of the time, the 
GSG departs from the pure exponen- 
tial form and becomes more ‘‘bell- 
shaped,” i.e., convex upward in the 
vicinity of the CS. This conclusion 
is supported by Humphreys’ results 
(1939, 1948). Although this model 
accounts for the observed form of the 
GSG (when that gradient is plotted 
against psychological distance), it 
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does not attempt an account of the 
changes in the degree of SG during 
discrimination learning. In this 
respect, then, the metric model is 
less general than the set-theoretic 
models, like Bush and Mosteller’s 
which incorporate a learning mech- 
anism. Perhaps the two types of ap- 
proach will eventually be combined 
so as to account both for the shape 
of the GSG and for the temporal 
course of SG. 


Summation of Generalized Response 
Strength 


A theorem of Hull’s (1939) which 
has excited much interest but little 
research suggests that overlapping 
GSGs originating from two or more 
points on the same continuum will 
Summate and result in incremented 
generalization responsiveness in the 
area of overlap. Bilodeau, Brown 
and Meryman (1956) studied gen- 
eralization along a visual-spatial di- 
mension with a task in which Ss gave 
an instructed voluntary response, 
They found a summative effect be- 
tween two positive points separated 
by 16° of visual angle but not be- 
tween two points separated by 32° of 
visual angle. They are careful to 
point out that while their procedures 
have a formal correspondence to the 
methods assumed by Hull, they do 
not meet all of his assumptions, 
Guttman and Kalish (1957) point 
out that the Bilodeau, Brown and 
Meryman experimental design per- 
mits summation to be observed at 
six different points on the gradients. 
It occurs at only one of these, the 
single point between the training 

stimuli. This latter finding is re- 
peated in the data presented by 
Guttman and Kalish. They used the 
pigeon pecking apparatus exposing 
different hues on the pecking key to 
test for generalization. Their method 


comes closer to satisfying the condi- 
tions for summation as assumed by 
Hull (e.g., reinforcement used to 
build up habit strength rather than 
instructions) than that of Bilodeau, 
Brown and Meryman. Thus, while 
in both studies summation seems ob- 
servable at points between the train- 
ing stimuli, it is disturbing not to find 
it at any of the other possible points 
on the gradient. 


THE LASHLEY, WADE-HuULL Con- 
TROVERSY AND SOME New 
INTERPRETATIONS 


Hull (1943) postulated that as a 
result of the reinforcement of a re- 
sponse to a stimulus, gradients of 
associative strength develop to non- 
conditioned stimuli which produce 
empirical GSGs. In an uneven ar- 
ticle which, however, contains pene- 
trating criticismsdemandingresponse, 
Lashley and Wade (1946) present a 
strong attack on this Hullian postu- 
late as well as other aspects of re- 
search on SG. One point lies at the 
basis of their criticism; this is their 
assertion that the empirical phe- 
nomena of SG do not mirror under- 
lying gradients of habit strength de- 
veloped during conditioning, but in- 
stead represent a “failure of associa- 
tion.” By this they mean that the 
reason S responds to the GS in the 
test period is that § has not yet been 
Conditioned to respond differentially 
ed relevant aspect of 
the CS. It is only during testing that 
the GSG develops, and this is only 

Ccause of S's attention being di- 
rected to the E-defined dimension, 

Hull’s (1947) otherwise effective 
and informative answer to the Lash- 
ley and Wade Paper failed to deal 
with the important “failure of asso- 
ciation” argument, Brown, Bilodeau, 
and Baron (1951) fractionate the 
argument and dismiss it as circular 
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and untestable. They point out that 
the only way one can know when S 
has failed to learn to respond to a 
particular stimulus element is when 
S does not respond differentially to 
that characteristic. However, this 
failure to respond differentially is the 
operational definition of SG. Since 
these two are defined by identical 
operations, Brown, et al. argue that 
the Lashley-Wade criticism is cir- 
cular. However, if this criticism 
were to suggest a specific differential 
test, it could avoid this property of 
circularity. 

The “failure of association” notion 
is in one sense similar to Estes’ learn- 
ing theory since it proposes that only 
certain elements of the stimulus 
situation are associated with the 
response on any one learning trial, 
due to the specificity of S's attending 
responses. If, in a test trial, E 
changes only one small aspect of the 
stimulus, S will respond to the aspect 
to which he had attended and which 
remains unchanged. However, if, 
during training, S has by chance at- 
tended to the £-defined relevant ele- 
ment of the stimulus then S will not 
respond when that element is 
changed; no SG will be observed. In 
these terms a specific prediction may 
be made. The greater the number of 
conditioning trials the greater the 
likelihood of S attending to the 
relevant stimulus element and thus 
the greater the likelihood of differen- 
tiation of the CS from the GS. Thus, 
generalization would be expected to 
decrease as a function of the number 
of conditioning trials. As pointed 
out above quite the opposite result 
has been regularly observed. This is 
seen as very telling evidence against 
the failure of association position. 

Growing from the failure of asso- 
ciation argument is the assertion 
that the GSG is a product of the test 


period and develops as a result of 
S's attention being directed to the 
E-defined relevant dimension. Lash- 
ley and Wade assert that no falling 
GSG will be observed unless the CS 
is contrasted with a GS so that the 
relevant dimension is made manifest 
to S. Since underlying gradients of 
response strength cannot develop in 
the conditioning period ‘“‘training 


‘with one stimulus fails to produce a 


significantly greater strength of asso- 
ciation with that stimulus than with 
others on the same dimension.” 
Lashley and Wade specifically sug- 
gest the method of single stimulus 
training (followed by a single test 
with a GS) as a crucial test. Such a 
test has since been repeated many 
times over, resulting in occasional re- 
ports of first test trial gradients. The 
Grandine and Harlow (1948) study 
was specifically designed with the 
Lashley and Wade restrictions in 
mind. Regular GSGs were observed 
on the first test trial. However, in 
this instance it should be noted that 
Grandine and Harlow are in the 
minority. Most experiments have 
not found a falling gradient on the 
first test trial. More often there is no 
difference between the CS and the 
GSs. 

Razran (1949) has proposed an- 
other interpretation of generaliza- 
tion phenomena. He suggests that 
there are two types of generaliza- 
tion: ‘‘(a) pseudogeneralization and 
(b) true generalization.” The Lashley- 
Wade failure of association position 
refers to pseudogeneralization phe- 
nomena where S, because of age, in- 
firmity, or circumstances, is unable 
to note distinguishing characteristics 
of the CS. “The relation of pseudo- 
generalization to true generalization 
is not unlike that of the undifferen- 
tiated total action of the young foe- 
tus to the structured whole activities 
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of the fully developed individual.” 

In his “subsequent testing hypoth- 
esis” Razran further asserts that 
“generalization develops, not during 
the original training of the condi- 
tioned stimuli, but during the sub- 
sequent testing of the generalization 
stimuli.” This hypothesis is similar 
to the Lashley-Wade position on this 
matter. Razran is suggesting that 
during training no tendencies or 
propensities to generalization are 
built up. During testing with the 
GSs, Ss will “categorize or rate the 
new stimulus on some sort of crude 
similarity-dissimilarity scale” con- 
sisting of two or three steps. 

The present writers hold a view 
somewhat intermediate between that 
of Lashley, Razran and Hull. Before 
developing this view it may be help- 
ful to analyze some of the assump- 
tions of what we will call, for want of 
a better name, the Hull position, It 
should be made clear that this elab- 
oration of Hull’s statements is the 
writers’ responsibility. It is not im- 
possible that in order to draw a 
sharper contrast they have in cer- 
tain instances exaggerated Hull’s 
position, 

Vigorous generalization responses 
are routinely observed on the first 
test trial following single stimulus 
training. Inasmuch as such responses 
would not occur in the absence of 
this training and the degree of such 
training has a marked effect on the 
degree of generalization behavior, the 
phenomenon of stimulus generaliza- 
tion is probably best seen ag resulting 
from the operations involved in 
training. 

On the first test trial following 

training (generally acknowledged to 
be the “purest” measure of SG) one 
should expect to find a regular de- 
scending GSG. Instead, the GSs 
usually elicit almost as much re- 


sponse strength as the CS (‘‘com- 
plete” SG). Hull suggests that this 
is due to strong responsiveness oc- 
curring immediately after training 
(not necessarily a convincing argu- 
ment in view of the action of reactive 
inhibition which would tend to lower 
responsiveness) and to  unextin- 
guished reactions to irrelevant situa- 
tional stimuli such as apparatus 
clicks. Wickens, Schroder and Snide 
(1954) attempted to extinguish such 
responses to an irrelevant aspect of 
the stimulus before beginning SG 
testing, but nevertheless found com- 
plete SG on the first test trial. 

The expectation of a descending 
GSG on the first test trial carries 
with it some assumptirvelg First, one 
must assume tha* SG et $ 
develops as a coi „quer athe 
ing. This seems te ean acter 
assumption. The s¢cond assume. 


: ae Rcon radien 
is that training resuy|ts 10 a gral 


de of veness 


) 
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of apportionment o potentia - 
sponsiveness along thea% rdeYF stim- 


ulus continuum. For ‘exumple, it is 
expected that if Ss are trained to 
salivate to a tone 1000 cps and each 
S is presented once with a tone at 
some other point on the cps con- 
tinuum, the extent of each S’s re- 
Sponse will be proportional to the 
physical or jnd distance of his test 
stimulus from the training stimulus. 
If S is to respond in a proportional 
manner (i.e., respond more to 1400 
cps than to 1800 cps) it must be as- 
sumed that he came to the first test 
trial with associations between spe- 
cific stimulus values on a physical or 
jnd scale and certain levels of re- 
sponse strength. That is, it must be 
assumed that before the first test 
trial he already had a certain num- 
ber of drops of salivation set aside 
for 1400 cps and a smaller number 
reserved for 1800 cps. 

However, there is another possi- 
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' bility. What if, following training, S 


set up associations between gen- 
eralization responses and an ordinal 
scale of stimulus values rather than a 
jnd or continuous physical scale? 
That is, what if the extent of S’s re- 
sponse to a given GS was propor- 
tional to the number of stimulus units 
that separate that GS from Stimulus 
0 (with number of stimulus units de- 
fined in terms of the population of 
stimulus units S has experienced 
within the immediate experimental 
situation)? This would explain the 
fact that almost complete SG is 
usually observed on the first test 
trial; the first GS that S experiences 
that is on the same continuum as 
Stimulus 0 will be, at that point, 
only one stimulus unit from Stimulus 
0, and will elicit a relatively large re- 
sponse. For example, let us say that 
in an experiment on pitch generaliza- 
tion, Stimulus 0 is 1000 cps and one 


- S receives a 1400 cps GS and another 


S receives an 1800 cps GS on the first 
test trial. For each S in terms of his 
experience in the experimental situa- 
tion, his GS will be only one unit re- 
moved from Stimulus 0. With con- 
tinued testing S’s GS units hierarchy 
will change until he has experienced 
the full range of GSs used in the ex- 
periment. At this point a regular 
descending GSG would be expected. 
This would in turn explain another 
typical finding of generalization ex- 
periments, a regular descending GSG 
is usually only found at a relatively 
late stage of testing. Also, it is 
typically observed that the slope of 
the GSG increases with continued 
testing. » i 
Let us take another situation 
where some differential predictions 
may be made. The Hull position, as 
stated above, would tend to predict 
that the amount of responsiveness 
elicited by a GS will be determined by 
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its distance from the CS along a 
physical or jnd scale without regard 
to what other GSs, if any, are pre- 
sented in the experiment. In terms of 
the alternative notion being pre- 
sented here, if the CS is a tone of 
1000 cps and 1600 cps is the critical 
test GS, it will make a great differ- 
ence whether the other GSs are (a) 
1200 cps and 1400 cps or if the other 
GSs are (b) 1800 cps and 2000 cps. In 
the case of (a), the GS at 1600 cps 
will be three units from the CS; in the 
case of (b), 1600 cps will be one unit 
from the CS. If Ss allot their gen- 
eralization responsiveness according 
to units of separation of GSs from the 
CS then it would be predicted that 
the 1600 cps stimulus would elicit 
more responsiveness in the (b) situa- 
tion than in the (a) situation. 
Further, it would be predicted that 
an important determinant of amount 
of response to any of the GSs would 
be their units separation from the 
CS. Thus, the 1200 cps GS in (a) 
should elicit about as much gener- 
alization as the 1600 cps GS in (b) 
since both are one unit of separation 
from the CS. Some empirical sup- 
port for this interpretation of gen- 
eralization phenomena is found in a 
comparison of studies by Hovland 
(1937) and Humphreys (1939). Hov- 
land reports the generalization of a 
PGR response to tones 25, 50, and 
75 jnds from the CS. Part of the 
Humphreys experiment repeated the 
Hovland procedure, confining itself 
to GSs that were 5, 15, and 25 jnds 
from the CS. The CS and 25 jnd 
tones were 1967 dv and 1000 dv, re- 
spectively. The two studies are com- 
pared in terms of the units hypoth- 
esis in Table 1. The units hypoth- 
esis would predict that the Hum- 
phreys 25 jnd tone would elicit as 
much response as the Hovland 75 
jnd tone (both being three units from 
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TABLE 1 
GENERALIZATION STIMULI IN JNDS FROM CS 


Hovland cs 25 50 75 
Humphreys eS f g 25 


tke CH. The generalization re- 
sponsiveness in the two studies may 
be roughly compared in terms of rela- 
tive generalization (response to GS 
/response to CS). The Humphreys 25 
jnd GS/CS ratio was .72; the Hov- 
land 75 jnd GS/CS ratio was .70. 
The 25 jnd Hovland GS (one unit of 
Separation) élicited more relative 
generalization than the 25 jnd Hum- 
phreys GS (three units of separation). 

umphreys termed this the “frame 
of reference” effect stating that the 
“generalized response may depend 
upon the setting in which stimuli are 
Perceived.” The writer has recently 
completed a number of studies test- 
ing the units hypothesis in SG ina 


number of continua, spatial, tactual 
and temporal and in terms of pre- 
dictions regarding the summation of 
gradients (1959). An early study 
(1956) which replicated the Hum- 
phreys-Hovland comparison in the 
spatial continuum using the Brown, 
Bilodeau, and Baron (1951) appara- 
tus lends strong support to the units 
interpretation, 

This research on the units hypoth- 
esis has shown that this is an im- 
portant variable in determining gen- 
eralization behavior. However, it 
has been noted that the predictions 
of the units variable tend to break 
down at the extremes of the con- 
tinuum being tested. An S trained to 
respond to 50 cps and tested at 
20,000 cps will not behave the same 
asan S trained at 50 cps and tested at 
100 cps despite the fact that the test 
stimulus is one unit of separation 
away in both cases, 
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Over the years, Cattell has out- 
lined and followed a strategy for de- 
termining and assessing personality 
traits which initially called for the 
identification of factors in three 
media, namely, behavior ratings (BR 
or L data), questionnaires (Q) and 
objective tests (T). Factor identi- 
fication was to be based upon blind 
rotation to simple structure, which 
according to Cattell has the virtuous 
property of revealing functional uni- 
ties which are invariant within the 
tolerances set by sampling and meas- 
urement errors. Implicit in Cattell’s 
strategy was the hope that the vari- 
ous media would reveal similar struc- 
turings and thus give evidence, in 
terms of converging operations, that 
a structural model had been found 
which had a claim to fitting reality 
better than other possible models. 

Assertions made in recent publica- 
tions by Cattell and his co-workers 
would lead the reader to believe that 
research has demonstrated an al- 
most one-to-one correspondence be- 
tween BR and Q personality factors. 
These assertions also carry with them 
the implication that Cattell’s long 
sought for evidence of factor in- 
variance across media has been 
found, at least for BR and Q factors. 
In his most recent book (1957) Cat- 
tell summarizes the present state of 
his research on cross-media matching 
with the following statement: 

The upshot of this cross-media matching is 
clearly that L and Q factors are far more com- 
pletely mutually matched than either is with 
T factors. Only L-data factors J and K, and 
the factors noted before to be specific to 


Q-data, namely, Qu, Qn, Qs, and Qs, fail of 
cross-matching (p. 325). 


Cattell hedges this conclusion by 
presenting possible alternative cross- 
matches for 3 of 12 factors for which 
matching is claimed (Cattell, 1957, 
Table 8-7, p. 326) and by stating in 
his preface (p. xi) that the matching 
across media may be even less re- 
liable than it seems. However, an 
even stronger statement on this sub- 
ject appears in a recent paper by 
Scheier and Cattell (1958). They 
write: “Factors in the rating and 
questionnaire realm have been se- 
curely linked to each other with all 
but a few factors now in one realm 
finding their counterparts in factors 
in the other realm” (p. 608), Further 
along in this paper, the following 
statement is made: “Moreover, the 
linkage between questionnaire and 
rating realms is so well established 
that a match to a questionnaire fac- 
tor is excellent presumptive evidence 
of a match to a parallel rating factor” 
(p. 609). 

Since the present writer had quite 
a different impression of the present 
status of research on the matching of 
BR and Q factors from that given in 
the preceding quotations, he under- 
took the present review of the evi- 
dence and issues involved in relating 
BR and Q factors. 


THE EVIDENCE 


Scheier and Cattell (1958), and 
Cattell, Saunders, and Stice (1957) 
make reference to Cattell (1957) in 
support of their assertions quoted 
above. Cattell (1957) cites three 
cross-media studies, two with adults 
and one with children. 

The first cross-media study was 
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carried out in 1945-46 by Cattell and 
Saunders (1950). The sample con- 
sisted of 118 college men and 240 
college women. The behavior rating 
factor scores were based on one to 
five rating variables each, generally 
those found common to two previous 
studies (Cattell, 1947; Cattell, 1948), 
On the average, three rating variables 
were used to estimate each of 13 be- 
havior rating factors. Questionnaire 
factor scores were based on one ques- 
tion for eight factors (Q-1, Q-2, Q-3, 
Q-17, Q-9, Q-4, Q-13, and Q-14) and 
on five questions for the remaining 
eight factors (Q-10, Q-5, Q-8, Q-6, 
Q-7, Q-15, Q-?, and Q-12). Objec- 
tive tests were also included, but 


a Eee 


Q-13 Obsessionally careful and considerate (1) 


BR-G 


BR-K Socialized, cultured mind (intellectual, 


BR-N Sophistication (polished 


Q-17 Lack of annoyance at Superiority (1) 


these are not of interest for the 
Present discussion. The analysis 
was based on tetrachoric r’s and used 
the multiple group centroid method. 


hirty-one visual rotations were 
made before a Satisfactory simple 
Structure was reached. Out of the 


12 factors extracted, the authors 


claim finding a match for two BR and 
measures, 


— aR 


BR-L Paranoid schizoth 


tive; suspicious, broods, feels persecuted) 


Q-17 Common annoyance at superiority (1) 


In setting out some of the results 
below, brief descriptions of the vari- 
ables entering into the behavior rat- 
ing factor estimates are given in 
parentheses. Since each rating vari- 


l » Polite, refined in dealing with 
ignores people; aesthetically fastidious) 


able is actually a complex descrip- 
tion, commas are used to separate 
adjectives within a rating variable 
and semicolons are used to separate 
rating variables. The description of 
only one pole of the bipolar variables 
is given. A fuller description of the 
variables can be found in Cattell 
(1947). The number of questions en- 
tering into each questionnaire factor 
estimate is given in Parentheses fol- 
lowing the factor title. The actual 
questions used can be found in 
Saunders (1949) or Cattell (1950a). 

he first factor for which a match 
is claimed is Factor 8 which had the 


following loadings of BR and Q factor 
scores: 


The authors claim a match between 
BR-G and Q-13. When one examines 
the original correlations (Saunders, 
1949), it is found that BR-G and 
Q-13 correlate ,02. The highest cor- 
relation of Q-13 with any BR meas- 
ure is .22 with BR-K. 

he second claim for a match is 
based on Factor 3 which has the fol- 
lowing BR and Q loadings: 


g, anxious; shy, bashful, sensi- 


41 
42 


The authors have this to Say about 


any factor, but also one of the most 
clear-cut matches of primary factors 


à 
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from each of the realms of data” 
(Cattell & Saunders, 1950, P. 251) 
The correlation between Q-17 and 
BR-L (Saunders, 1949) is .01. The 
highest correlation of Q-17 with any 
BR factor is —.24 with BR-K. 

Three other relationships between 
BR and Q factors are assumed to 
exist, although the reported data leave 
much doubt as to the validity of 
these assumptions. Factor 4, which 
Cattell and Saunders assume is a 
second-order fusion of factors H and 
F, had the following loadings: 


serve to bolster their tenuous argu- 
ments. In the reported data there is 
no basis for deciding what Q factors 
go with what BR factors at the pri- 
mary trait level. The evidence sug- 
gests at best that a second-order ex- 
traversion factor in behavior ratings 
may have a Q-data counterpart. 
However, the present associations 
are not consistent with those re- 
ported by Cattell later (1957) as 
components of the second-order ex- 
traversion factors in either BR or Q 
realms. Examination of the original 


BR-H; Adventurous cyclothymia (sum of Hz and Hi below)! 7 
BR-H, Adventurous cyclothymia (marked interest in opposite sex; gregarious, sociable, 


likes parties) 73 
BR-Hi Adventurous cyclothymia (composed, free from shyness) 66 
BR-D, Infantile sthenic emotionality (demanding, impatient, self-centered; attention- 
getting, shows off, brags) 72 
BR-D2 Infantile sthenic emotionality (self-willed, egotistic, headstrong, predatory; un- 
scrupulous, not conscientious, tends to be dishonest, selfish) 66 
BR-F Surgency (energetic, alert, quick, spirited; cheerful, optimistic, enthusiastic; ad- 
venturous, bold, happy-go-lucky; talkative) 64 
BR-C General emotionality (neurotic fatigue, irrationally irritable, jumpy, nervous 
symptoms; emotional, excited easily; changeable, undependable, moody) —.63 
Q-2 Lack of shyness (1) 252 
BR-G Lack of positive character integration (sce factor 8 above) F i 


Q-10 Rhathymia (5) 


Cattell and Saunders have the fol- 
lowing to say about this factor: 

There is reason to believe that this represents 
a failure of analysis. Scattered evidence, ien 
alternative rotations of the present data, indi- 
cates that Q-2, lack of shyness, is the Q factor 
equivalent of (BR) H, and that Q-10 (Guil- 
ford’s Rhathymia) is the match of (BR) F 
surgency. Here all four are involved in a 
single factor along with (BR) D. This almost 
certainly represents a second-order factor, due 
to the correlations among our factors being 
unduly high, as a result of biased estimates, 
or to a failure to extract sufficient factors to 


_ obtain separation of H and F (p. 251). 


Cattell and Saunders present no cor- 
relations or factor plots which might 


1 The loadings on Hs, Hz, and Ha are spur- 
ious due to the part-whole relationships and 
the resulting correlated error. 


correlation matrix (Saunders, 1949) 
gives no further support to the as- 
sumed matches. Q-10 correlates .26 
with BR-Hs and .20 with BR-F. 
Q-2 correlates —.44 with BR-Hs and 
— 39 with BR-F. Q-10 has correla- 
tions of .20 or higher with BR-G, 
BR-F, BR-H, the highest r being 
with BR-G. Q-2 has correlations 
above .20 with all of the BR factor 
scores loading on Factor 4 as pre- 
sented above, the highest 7 being 
with BR-Hs. The conclusion is 
necessary that this study does not 
give evidence for the unique linking 
of BR-H with Q-2 and BR-F with 
Q-10. 


The final assumed relationship 
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between BR and Q factors was de- 
rived from the loadings on Factor 10 
which were as follows: 


BR-I 


satisfactory restoration of the original 
correlations from the factor loadings. 
Whatever the methodological dif- 


Sensitive, imaginative, emotionality (demanding, impatient, self-centered; de- 


pendent, immature; aesthetically fastidious; frivolous, undependable, thought- 
less; sensitively imaginative, intuitive) 43 


BR-G Lack of positive character integration (see Factor 8 above) 
Q-5 Emotionally sensitive self-sufficiency (5) 
Q-3 Interested in understanding nature (1) 


BR-N Sophistication (see Factor 8 above) 


Cattell and Saunders assume that Q-5 
goes with BR-I, though again they 
present no evidence to support their 
selection of these particular BR and 
Q factors as being matched. The 
correlation between Q-5 and BR-I 
(Saunders, 1949) is .06. The highest 
correlation of Q-5 with any BR factor 
is .20 with BR-J, Neurasthenia. 

In discussing the results, the fol- 
lowing interesting and appropriate 
comments are made: 

If the above factorization is doing what we in- 
tended it to do only one conclusion is possible, 
i.e., except for two or three instances, the 
known personality factors, contrary to our 
hypothesis, are not outcrops of the same fac- 
tors in different media. This does not mean 
that every factor may not have a manifesta- 
tion in all three regions; it only means that the 
examples we have taken from each region do 
not coincide. ... Obviously, it is the ques- 
tionnaire factors that lie most in a space of 
their own, whereas the behavior rating factors, 


+++, are rarely uninvolved (Cattell & Saun- 
ders, 1950, p. 256). 


The authors then go on to point out 
methodological difficulties which 
might lead to changing this conclu- 
sion at a later date. Among these 
difficulties are low reliabilities, the 
assumption that means and sigmas on 
behavior ratings were approximately 
equivalent for men and women, and 
the “obtrusive” problem of second- 
order factors. In addition, Cattell 
and Saunders found that some of 
their communalities ran above unity 
and that they could not achieve 


— .40 
39 
38 
34 


ficulties, and the present writer agrees 
that there were many, this study by 
itself surely cannot be taken as sup- 
portive evidence for the assertions 
concerning successful matching. 
However, this was a “pioneering 
study,” full of methodological pit- 
falls. What then do the two more 
recent studies have to offer in sup- 
port of the claim that there are se- 
cure linkages between corresponding 
BR and Q factors? 

The second study by Cattell and 
Saunders (1955) turns out to be a 
reworking of the same data which 
were used in the first study, except 
that the analysis was limited to the 
240 women. Again tetrachoric cor- 
relations were used. Fifteen factors 
were extracted of which 11 were 
“identified” with those from the 
first study. One new factor appeared 
and three were not interpreted. Ade- 
quate evaluation of this study is not 
Possible since the writer was not able 
to obtain the original correlation 
matrix.? However, of the factor 
matches assumed to exist in the first 
study, only the co-loading of BR-H 
and Q-10 (Rhathymia) was repli- 
cated when just the women’s data 
were used. Even this result is not 
firm since Q-12, masculinity, loads 
just as high on the H factor as Q-10 


? Personal communication with Cattell sug- 
gested that having the r matrix would not add 
much to the picture. 


p d 
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(.41 and .42, respectively). Rather 
than detailing the confusing picture 
of loadings which appeared in this 
second study, the writer would like 
to quote Cattell’s discussion of its 
results in his recent book (1957). 
Cattell writes: 


The second study improved on the first by 
keeping homogeneity of sex (240 women). 
Also, its design reacted to the absence of “‘op- 
posite numbers” to certain factors in the first 
study by taking out slightly more factors in 
each medium, i.e., by searching for factors of 
lower variance. Its results were again com- 
plex, but consistent with the first, suggesting 
that complexity is not merely chaos or error 
(p. 324). 


In his discussion of this study, Cat- 
tell gives the reader no indication 
that the same data were being re- 
analyzed, and he implies that the 
” study confirmed a previous 
one, Since the second study was 
based on two-thirds of the cases from 
che first study, some consistency of 
findings should be expected. Ac- 
tually the results were not consistent 
with regard to the relations of L and 
Q factor scores. Since a new sample 
was not used in the second study, 
Cattell is hardly justified in saying 
that these results suggest that the 
first study was not merely chaos or 
error. At any rate, this study does 
not add to the evidence for a secure 
linkage between BR and Q factors. 
The third study cited in support of 
the alleged linking of BR and Q fac- 
tors (Cattell & Beloff, 1956) involved 
153 eleven-year-old boys and girls 
who were rated on behavior factors 
C, E, F, G, and L. Four to 10 vari- 
ables were used to estimate each 
factor. These variables were so 
summed as to give two equivalent 
estimates of each factor. The ques- 
tionnaire factors involved the items 
in the Junior Personality Quiz (Cat- 
tell & Beloff, 1953), 7 to 13 items 


being used per factor. Again two 
estimates were made of each factor. 
As in the previous studies, objective 
test measures were included, but 
these are not important to the pres- 
ent discussion. After computing 
product-moment correlations, 13 fac- 
tors were extracted using the group 
centroid method and rotated to sim- 
ple structure by the quartimax 
(Neuhaus & Wrigley, 1954) criterion. 

Two additional graphic rotations 

were required to obtain satisfactory 

simple structure. When the 13 fac- 

tors are examined, im no case do both 

a Q factor and a BR factor load on the 

same factor. 

These three studies represent the 
sum of the evidence referenced by 
Cattell in support of the contention 
that BR and Q factors cross match. 
Even if the results of these studies 
were positive, they could not be con- 
sidered conclusive with regard to the 
present 16 PF questionnaire factors, 
because of the unknown relationships 
of the present questionnaire factors 
to those used earlier. 

There are two further unpublished 
studies from Cattell’s laboratory 
which have some bearing on the is- 
sue. Meeland (1952) took as his 
doctoral thesis the problem of dis- 
tinguishing between factors A, F, H, 
and L, all of which fall in the extra- 
version-introversion area. Objective 
test, behavior rating, and question- 
naire data were obtained on 102 col- 
lege men from six housing groups. 
Again the objective test data will not 
be discussed. The rating factor scores 
were obtained by having each mem- 
ber of each group rate every other 
member (N varied from 14 to 22) on 
three variables loading on each fac- 
tor. The full description of the rat- 
ing variables can be found in Cattell 
(1947). Only a brief description of 
each will be given below: 
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vs. spiteful, critical 
vs. inflexible, rigid 
vs. languid, slow 


vs. depressed, anxious 
vs. taciturn, introspective 
vs. subdued, languid 


vs. shy, timid, withdrawn 
vs. uninterested in opposite sex 


vs. suspicious 
vs. jealous 


BR-A — Cyclothymia-Schizothymia 

a. Easy going, good natured 

b. Adaptable 

Gr Ready to cooperate 

BR-F — Surgency-Desurgency 

a. Cheerful, happy-go-lucky 

b. Talkative 

C Energetic i 
BR-H Adventurous Cyclothymia-Withdrawn Schizothymia 
a. Adventurous, likes to meet people 

b. Marked interest in opposite sex 

BR-L Trustful Cyclothymia-Paranoia 

a. Trustful 

b. Free from jealous tendencies 

CE Socially composed 


Forms A and B of the 16 PF (Cat- 
tell, 1950b) were used to gain ques- 
tionnaire measures of factors AE: 
H, and L. 

Meeland normalized the distribu- 
tion of all variables and used product- 
moment correlation coefficients in his 
analysis. The multiple group cen- 
troid method of factor analysis was 
used. Blind rotation was followed for 
11 full rotations, and then inspection 
indicated that Factor 1 was a fusion 
of F and H. An attempt was then 
made to separate F and H by extract- 
ing another factor from the residuals 
and rotating further. Six further 
rotations of all factors were carried 
out. Two of Meeland’s 11 factors 
showed co-loadings of behavior rating 
and questionnaire factors, 

Factor 1 (actually a reference vec- 
tor) had the following loadings: 


BR-F af 
BR-H -65 
BR-L 56 
Q-F 47 
Q-H 44 
BR-A -43 
Factor 3 had the following load- 
ings: 
Q-H -60 
BR-H 95 
BR-F 50 
BR-L 47 
Q-A -36 
Q-F -34 
Q-L- =a 


vs. bashful 


Meeland argues that Factor 1 “is 
best considered as F with consider- 
able contamination with H,” and 
that Factor 3 is H contaminated 
with F. When the present writer 
plotted the loadings on Factor 1 and 
3 against each other, he found a 
clear-cut hyperplane nearly orthog- 
onal to the vectors for F, H, and L 
which suggests that the best simple 
structure was contained in Meeland’s 
original Factor 1 which fused F and 
H. It is obvious that his attempt to 
Separate F and H by extracting 
another factor was an arbitrary pro- 
cedure which led to poorer simple 
structure. When one looks at the 
correlations between the various BR 
and Q factors (Table 1) the reasons 
for his difficulty in separating these 
factors becomes apparent. BR-F and 
BR-H¥ correlate .779, BR-F and 
BR-L correlate .790, and BR-H and 
BR-L correlate .714, Looking at the 
cross-media correlations for A, F, H, 
and L it is found that each of the Q 
factors correlates higher with at least 
one other factor which it is not sup- 
posed to match than it does with 
the factor it is supposed to match. 
Again, the possibility remains that a 
second-order BR extraversion factor 
might match with a second-order Q 
extraversion factor, but this study 


does not support the one-to-one 
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TABLE 1 


CORRELATIONS AMONG BR AND Q Factors 
(Meeland, 1952) 


Variables BR-A BR-F BR-H BR-L Q-A Q-F Q-H 
BR-F .437 me 
BR-H .425 .779 — 
BR-L .399 .790 714 — 
Q-A .106 .262 .276 .292 — 
Q-F .180 .490 523 .285 .139 -— 
Q-H 155 .597 .586 .516 .404 .485 = 
Q-L — .268 2008 a 80 | ee =. 254) p e—1.033)m encod 


Note.-N is 102 college men. Q-L is scored for the paranoia direction while BR-L is scored for the trustful cyclo- 


thymia direction, 


matching of L and Q factors at the 
primary trait level. 

Meeland (1952) appropriately con- 
cluded that, “Generally, there was a 
failure of the ratings, questionnaire 
and objective test measures to emerge 
simultaneously in their appropriate 
factors” (p. 52). 

The magnitude of the correlations 
between L and Q factors in this 
study, even though not supporting 
Cattell’s assertions, are of interest 
because they are higher than have 
been found in most other studies. 
The writer would like to suggest that 
the special testing conditions, in 
which each subject first evaluated 
every other member of his house on 
the rating factors and then was given 
the questionnaire (to be completed 
within 48 hours), set up a situation 
wherein each self-rater was likely 
to be using a similar frame of ref- 
erence, thus making the cross-com- 
parisons of self-ratings more mean- 
ingful. Also, it should be noted that 
each subject had the rating scales in 
his possession a month before the 
actual ratings were made. Having 
the rating scales available might 
have led to closer self-examination 
relative to other members of the 
group and thus increased the validity 
of the self-ratings. Needless to say, 


such results while of theoretical in- 
terest, could not be generalized as 
validity data for the 16 PF under 
typical conditions of use. 

The final study from Cattell’s lab- 
oratory was carried out by Horowitz 
(1951) on 60 college women. Horo- 
witz was concerned primarily with 
finding objective tests to measure 
factors A, H, and L. She included 
behavior ratings found previously to 
load on factors A, F, H, and L (nine 
ratings in all) and the 16 PF (first 
edition). Unfortunately, the only 
correlations reported are between the 
nine behavior ratings and Factor A 
of the 16 PF. Horowitz found corre- 
Jations ranging from 43 to .79 be- 
tween the 16 PF Factor A and rating 
variables of adaptability, good na- 
turedness, cooperativeness, and atten- 
tiveness to people (all found previously 
to load BR-A). On the basis of the 
reported data, the reader does not 
know to what extent these same rat- 
ing variables might correlate with 
other questionnaire factors, and so 
this evidence is of limited value. It 
should be remembered that with col- 
lege men under similar rating condi- 
tions, Meeland found Q-A to corre- 
late only .106 with BR-A. 

In reviewing the literature, the 
writer was able to find only one other 
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study (at the primary trait level) 
where factored questionnaires and 
factored behavior ratings were used. 
In this study (Becker, Peterson, 
Hellmer, Shoemaker, & Quay, 1959), 
57 mothers and 55 fathers were rated 
on 16 of the Fels Parent Behavior 
Rating Scales. These scales had been 
previously factored by Roff (1949). 
Guilford’s 13 questionnaire factors 
were also given. Separate factor 
analyses were completed for mothers 
and for fathers, using centroid fac- 
tors and the quartimax principle of 
rotation (Neuhaus & Wrigley, 1954). 
It was found that ratings of sociabil- 
ity loaded .62 on a mother-data fac- 
tor which was Primarily defined by 
loadings of 83 on Guilford’s Rha- 
thymia (R) and 82 on Guilford's 
Sociability (S). The loadings on R 
and S are undoubtedly spurious be- 
Cause many of the same items con- 
tribute to each factor. The correla- 
tions between the sociability ratings 
and S and R were 59 and .40 re- 
spectively. The only other factor to 
show appreciable loadings of rating 
and questionnaire scores was a father- 
data factor on which the Fels ac- 
tiveness rating loaded -69 and Guil- 
ford’s General Activity (G) loaded 
54. The correlation between the 
activeness rating and Guilford’s G 
was .40, 

Most of the Guilford factor scores 
“collapsed” into a single adjustment 
factor very similar to Cattell’s sec- 
ond-order anxiety factor or to the 
second-order emotional stability fac- 
tor reported by Eysenck (1953). This 
questionnaire adjustment factor, 
however, showed little relationship 
to ratings of adjustment or to prob- 
lem behavior in the children of these 
parents. Becker et al., concluded 
that the “domain of personality being 
tapped by the Guilford inventories is 

quite limited.” The reader should 
note that even the positive evidence 


for the matching of sociability in 
mothers (Cattell’s H factor) and ac- 
tiveness in fathers (no Cattell par- 
allel) is suspect because the ratings 
were based on a one-hour interview. 
These ratings were contaminated 
with self-reports of behavior, and it is 
therefore not too surprising to find 
some correlation with other self- 
report measures. The surprising and 
important Part of this study was that 
there were not more relationships 
between the interview ratings and 
questionnaire scores, 


Discussion 


It is apparent that the present 
evidence does not support the claim 
for “secure linkage” of BR and Q 
factors. This does not necessarily 
imply that future research using 
more reliable and factor pure meas- 
ures may not still Prove Cattell’s 
Proposition to be correct. However, 
there are a number of reasons why 
the present writer considers this out- 
come exceedingly unlikely, Proof of 
Cattell’s Propositions requires two 
things: (a) that Corresponding BR 
and Q factor scores be significantly 
correlated with each other, and that 
these correlations approach unity 
when Corrected for attentuation, as- 
suming that somehow rating bias and 
response bias factors have been 
Partialed out; and (b) that when fac- 
tored in the same Matrix and rotated 
blindly to simple Structure, the cor- 
responding BR and Q factor measures 
load on the same factors and no other 
factors, except possibly rater bias or 
response bias factors, 

The first criterion is necessary be- 
cause of the inability to state the 
statistical significance of a factor 
loading and because of the attenuat- 
ing effect of unreliability on factor 
loadings. The second criterion is 
necessary to demonstrate the cor- 


respondence of factor structure in 


A 
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the two media. Cattell has paid little 
attention to the “no other factor” 
requirement of the second criterion 
in evaluating his data. If both the 
first criterion and the “no other 
factor” part of the second criterion 
are ignored, as was done in the Cat- 
tell-Saunders papers, one is faced 
with the possibility that allegedly 
corresponding BR and Q factor meas- 
ures could load as high as .70 on the 
same factor when the correlation be- 
tween the factor measures is zero. 
This possibility assumes unit com- 
munalities. However, if the com- 
munalities are only .80, each factor 
measure (BR and Q) could still load 
57 on the same factor without being 
correlated with each other. 

Obtaining factor measures which 
could meet the above requirements, 
even if the correspondence exists, 
would be exceedingly difficult with 
present-day techniques. Typically 
with cross-media studies, one finds 
factor measures within a given me- 
dium coalescing. This difficulty 
arises for several reasons. The tech- 
nical skill necessary to construct 
factor measures which are free of 
other factors in the same media is 
lacking, even granting the added de- 
grees of freedom permitted by the 
use of oblique factor coordinates. 
Part of this difficulty arises from re- 
sponse biases of various sorts which, 
presumably, could be minimized or 
controlled with further knowledge. 
However, the major difficulty arises 
from the fact that any single rating 
scale or questionnaire item is likely 
to be factorily complex, and that 
while this problem can be handled 
theoretically (i.e. by balancing an 
item loading positively on an irrel- 
evant factor with an item loading 
negatively), in practice this problem 
has not been solved. 

Another problem (related in some 
ways to the above) is the fact that 
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with present day measurement tech- 
niques there is no way of specifying 
factor order in the personality do- 
main independently of a specific, 
arbitrary set of variables. Since it is 
not possible to specify beforehand 
the sampling of variables in BR and 
Q domains appropriate to a given 
order, there is no way of knowing that 
a given set of BR and Q variables will 
all lead to first-order factors which 
can then be cross-matched. In view 
of the haphazard method of item 
sampling, it is possible that ques- 
tionnaire factors will lie at different 
orders of generality from behavior 
rating factors. The relationship of 
order may also vary from factor to 
factor. Since there is no a priori 
solution possible to the variable sam- 
pling problem, it is doubtful if Cat- 
tell’s proposition is testable outside of 
a given arbitrary sampling system. 

Still another problem is the fact 
that the common language structure 
of a group of self-raters and judges 
could produce a pseudo matching of 
factor contents (salient variables). 
However, this meaning system bias 
would not necessarily lead to a 
matching of factor scores. The present 
writer believes that much of the 
hoped for cross-matching of BR and 
Q factors derives from the fact that 
common meaning systems bias the 
appearance of common patterns in 
“self” and “other” judgments. Such 
a result is, of course, artifactual and 
related to the problem of variable 
homogeneity. The greater the vari- 
able homogeneity (which in the Q- 
data media approaches the extreme 
of asking the same question again 
and again), the greater the probabil- 
ity that common meaning systems 
rather than common behavior pat- 
terns are producing apparent factor 
matches. Cattell, of course, is not 
insensitive to this problem (Cattell, 
1957). 
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Beside the statistical and technical 
difficulties with obtaining a proof of 
the cross-matching of BR and Q 
factors, the writer finds it difficult 
to believe on psychological grounds 
that such matching should occur in 
random samples of the population 
“with all but a few factors.” The 
Proposition that BR and Q factors 
can be cross-matched requires that 
a person’s judgments of his own be- 
havior correlate with and show a 
pattern similar to that derived from 
an external judge (or judges). This 
is an unlikely occurrence for several 
reasons: (a) defenses of various 
Sorts act to limit self-awareness, (b) 

the frame of reference (adaptation 
level) for making self-judgments will 
vary from person to person, making 

such scores not comparable, and (c) 

the experiences to be judged of 

necessity constitute different samples 
of behavior for self and other raters, 

These points can be illustrated by 
the analogous problem of matching 
factors in parents’ ratings with fac- 
tors in teachers’ ratings of children. 

The parent, like the self-rater is 

biased by his emotional involvement 
with his child and apt to be defensive 
about the child’s weaknesses. Each 
parent makes his judgments from a 
different frame of reference, depend- 
ing on the number of children his 
child’s age he has known, memory 
ability, and the like. Finally, the 
parent bases his ratings on a different 
sample of behaviors from that of the 
teacher. In a study with Cattell 
(Peterson & Cattell, 1958), } only 
part of which has been published, 
Peterson had teachers in the Univer- 
sity of Chicago Nursery School rate 
80 children, four to six years old, on 
36 variables, and had parents (mostly 
mothers) rate the same children ona 
different sample of 44 variables which 
covered much the same behavioral 


domain. Peterson and Cattell were 
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able to identify 13 factors in each 
analysis, named for Cattell’s factors 
A; GC, DEE G,.H, I,J, KL: M, 0: 
These two sets of factors were cross- 
matched in terms of psychological 
meaning. When the correlations be- 
tween the teacher factor scores and 
the parent factor scores were com- 
puted for the matched factors, they 
were found to range from —23 to 
+25, averaging approximately zero. 
While this is an extreme example, it 
parallels most of the problems in- 
volved in matching BR and Q factors. 

The truth on this cross-matching 
issue probably lies somewhere be- 
tween the position taken by Cattell 
and the extreme example outlined 
above (cf., Guilford, 1959). There 
are undoubtedly some points where 
questionnaire factors and behavior 
rating factors overlap, but there are 
just as undoubtedly many points 
where they do not overlap. 


DIRECTION or FUTURE RESEARCH 


Psychologists are going to con- 
tinue to use questionnaires because of 
their convenience. In terms of the 
present state of knowledge, it would 
seem mandatory that factors in per- 
sonality inventories þe viewed as 
dimensions of self-perception or di- 
mensions of the self-concept rather 
than in terms of assumed behavioral 
dimensions, It is of course completely 
justifiable to find out just what kinds 
of behaviors factored questionnaires 
will predict, as both Guilford and 
Cattell have done. Further work 
along this line is in order, 

The use of questionnaires is typi- 
cally justified in terms of one of two 
rationales, namely, as a substitute 
for a standardized interview, or as a 
means for the economical assessment 
of personality traits. It is only the 
latter rationale which is of concern at 
the moment. Historically there have 


been two major criteria used in scor- 
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ing questionnaire items in personality 
assessment: (a) an internal consist- 
ency criterion, which has gradually 
given way to the more sophisticated 
simple structure factor criteria and 
(b) a behavior rating criterion. Cat- 
tell and Guilford have taken the first 
approach, Hathaway and Gough the 
latter. Each of these methods has its 
liabilities. The factor criterion gives 
scores which lack a predictive be- 
havioral reference (though one might 
be interested in self-perceptions per 
se). The behavior rating criterion, 
as used by Hathaway and Gough, 
overcomes this difficulty by building 
predictive validity into the scoring 
system. Unfortunately, the resulting 
test scores from this latter approach 
typically have had high correlations 
among themselves, indicating that 
considerable time and effort were be- 
ing wasted in measuring, say, the 18 
variables in the Gough inventory 
(1957) which could probably be 
handled with four or five uncorrelated 
measures. 

The writer would like to suggest a 
third approach to this problem which 
might overcome some of the above 
difficulties. It is suggested that be- 
havior rating factors (as orthogonal as 
possible) be used as the criteria in 
developing questionnaire measures of 
personality. More specifically, the 
writer believes that Cattell’s be- 
havior rating factors (fusing a few of 
the highly interrelated factors and 
eliminating a few of the small vari- 
ance factors) have been sufficiently 
replicated on various populations to 


be seriously considered as criteria for 
questionnaire development. Since 
the cultural experience of men and 
women in our society is quite dif- 
ferent, it would probably be neces- 
sary to develop separate question- 
naires (beginning at the item validity 
level) for men and women. This 
methodology is applicable to various 
age groups, Or to groups of special 
clinical or theoretical interests (i.e. 
parents). Attention will, of course, 
have to be paid to the problems of 
response bias and validity of report. 
A final possibility of this approach is 
the development of a single question- 
naire which can be scored in terms of 
either a self-perception-factor cri- 
terion or in terms of a behavior-factor 
criterion. The differences in the two 
kinds of scores (assuming adequate 
reliability) might provide interesting 
leads about possible areas of conflict 
or defensiveness. 


SUMMARY 


This paper challenges assertions 
made in recent publications by Cat- 
tell and his co-workers to the effect 
that present research has shown in 
most cases a one-to-one matching of 
behavior rating and questionnaire 
factors. A review of the evidence 
fails to support these assertions. 
Many of the technical, statistical, 
and psychological issues involved in 
the possibility of matching behavior 
rating and questionnaire factors are 
discussed. Suggestions for future re- 
search using questionnaires and be- 
havior ratings are given. 
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The interest of modern psychology 
in the experience of time (‘‘time 
sense,” time perception, time estima- 
tion, etc.) dates back to the last 
century. Reviews of experimental 
studies concerned with a number of 
aspects of time have been published 
in the Bulletin on several occasions. 
The last one, by Gilliland, Hofeld, 
and Eckstrand (1946), appeared 
some 10 years ago. 

During the past decade or two the 
interest in the psychology of time 
has not only increased but has broad- 
ened as well. Research workers have 
not been satisfied to study temporal 
experience or “time perception” as 
an isolated phenomenon. A trend has 
developed in the direction of investi- 
gating the relationships between tem- 
poral experience and other personality 
phenomena, normal and abnormal. 

In addition, considerable attention 
has been given to more general as- 
pects of temporal experience. This 
tendency has shown concern with 
the molar rather than the molecular 
approach to temporal phenomena. 
The terms “time perspective” and 
“time orientation” have become fa- 
miliar to social psychologists, per- 
sonologists and clinicians. Workers 
have become involved in the study 
of “macro-events” revolving around 
the relationships between persons’ 
past, present and future within their 


1 The writers wish to express their appreci- 
ation to Howard H. Fink for his contribution 
to an earlier draft of this paper. 


phenomenological frames of refer- 
ence. 

The present review will attempt to 
survey the more recent studies in 
time perception and methodology 
along traditional lines as well as 
studies in connection with the “new 
look’’ which involves the personality 
correlates of time perception. Re- 
search embracing the seemingly dis- 
parate topics of time perspective and 
time orientation will also be dis- 
cussed. Because of the relatively un- 
crystallized state of this subject mat- 
ter, not only experimental studies, 
but also the more empirical-observa- 
tional, exploratory and frankly spec- 
ulative papers in this area will be in- 
cluded. 


DEVELOPMENTAL ASPECTS OF THE 
CONCEPT OF TIME AND 
TIME PERCEPTION 


Though philosophers resorted to a 
genetic explanation of the “time 
sense,” there is ample empirical evi- 
dence which indicates the develop- 
mental nature of this capacity. De- 
spite Kant’s speculations, there is 
evidence to indicate that the capacity 
to experience time and to estimate it 
is a gradually developing human 
characteristic (Fraisse & Vautery, 
1952). Within the last few years, ef- 
forts to present formal accounts of 
the development of temporal con- 
cepts and experience have emanated 
from two major lines of approach: 
(a) psychoanalytically oriented the- 
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orizing and (6) extensive empirical 
investigation. 


Psychoanalytic Views 


Although not directly concerned 
with the problem of the development 
of the time sense, Freud (1950, 1952; 
Schneider, 1948) suggested that the 
notion of time was derived from the 
Manner in which the Perceptual- 
Conscious system functioned. This 
system was considered by Freud to be 
the medium through which impulses 
passed to cathected external objects 
and stimulation from the outside 
world returned to what Freud termed 
“unconscious memory systems.” 
When these cathexes were with- 
drawn, however, breaks in the conti- 
nuity of this dynamic process oc- 
curred, leading oftentimes to a com- 
plete absence of the external world. 
For Freud, such periodic interrup- 
tions lay “at the bottom of the origin 
of the concept of time” (Freud, 1952), 
Implied in this formulation is the 
view that the development of the 
notion of time is a gradual process, 
resulting not from a single with- 
drawal of a cathected impulse but 
from many such withdrawals extend- 
ing over a period of time. 

At least an implicit acceptance of 
this view is to be found in other psy- 
choanalytic contributions to this 
area. For the most part, however, 
these reports are primarily con- 
cerned with specifying the phases of 
psychosexual development during 
which the notion of time emerges, 
Yates (1938), for example, attributes 
the development of the appreciation 
of time to the Patterns of body 
rhythm that become manifest at the 
oral period. These rhythmic pat- 
terns, Yates continues, become re- 
lated to the periodic frustrations and 
satisfactions of bodily needs (espe- 

cially the intake of food), thereby 
providing the basis for the develop- 


ment of the time sense. Fenichel 
(1945) and others (Dooley, 1941; 
Arieti, 1947), on the other hand, 
focus upon the anal phase of develop- 
ment as primary, and note the influ- 
ence of programs of toilet training 
upon the acquisition of the time 
sense. According to other psycho- 
analysts, however, later phases of de- 
velopment are considered to be more 
significant in this process (Bonaparte, 
1940; Oberndorf, 1941). 

Theoretical formulations from such 
psychoanalytic sources, so far with- 
out experimental validation, allow 
the following reconstruction of the 
development of the experience of 
time: the young organism begins to 
experience some frustrations quite 
early. Ordinarily, however, most 
early experiences of infancy are non- 
frustrating at first, The infant is 
cared for and protected from his en- 
vironment, Every wish is fulfilled 
immediately by adults in the environ- 
ment. As the organism grows and de- 
velops, minor frustrations 


gradually learn to cope w 
vironmental frustrations w 
from situations w 
quire Postponeme 


ith the en- 
vhich result 
hich frequently re- 
nt of gratification. 
e also must learn to manipulate the 
environment and the elements within 
it for his own Purposes. These proc- 
esses are the stuff from which the ego 
is made. 

Several steps in the 
ponement of gratific 
posited: 


Process of post- 
ation may be 


1. Immediate f 
gratified. 


2. Suppression of need and of emotional re- 
action to the experience of frustration. 
3. Development of anticipation of Jater 
gratification on the basis of: 
a. past experiences, 
b. “hallucinati 
ification, 


rustration when need is not 


on” (imagination) of grat- 


, 


ti 
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4. Projection into the future—expectancy. 

_5. Expectancy is possible providing the dis- 

tinction between reality and nonreality ex- 
perience can be made. 

6. Development of 
words. 

7. Development of the time sense expressed 
by conventional language—now, later, today, 
tomorrow, etc. 

8. The internalization of conventional time 
divisions and the acquisition of the abstract 
notion of time as a continuous, evenly flowing 
substance—from the past into the future. 


symbolization—via 


Empirical and Experimental Studies 


To a large extent, the findings of 
several empirically oriented investi- 
gators are compatible with the the- 
oretical position outlined in the pre- 
ceding paragraphs. Piaget (1955), 
for example, suggests that the earliest 
experience of time stems from a sen- 
sory-motor recognition of the waiting 
period between feedings, a view 
which is similar to one alluded to pre- 
viously. The child’s egocentricity 
and his relatively poor resources for 
differentiating clearly between the 
external world and himself, Piaget 
maintains, interfere with the de- 
velopment of objective concepts of 
time. As this early phase is out- 
grown, such concepts begin to emerge 
although they are not verbalized un- 
til the age of four. During this silent 
period, however, the ability to extend 
the idea of time into both the past 
and future continues to develop. 
When egocentricity starts to dimin- 
ish during the seventh year, the child 
begins to reflect upon his notions of 
temporality, and to compare them 
with the temporal notions of others. 
With added experience, modifications 
occur, and concepts are acquired 
which are more socialized, and thus, 
more adequate. A somewhat similar 
account of the order of development 
of time concepts in successive stages 
was reported by Bradley (1947). 

Other investigators, however, tend 
to disagree with Piaget’s formulation 
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of the age of maturation of the notion 
of time. Oakden and Stuart (1922), 
for example, suggest that the child 
has little ability to conceive of the 
continuity and development of time 
until the age of 11—a view supported 
by Wesley (1942). According to 
these writers, the adult concept of 
time does not fully mature until the 
thirteenth or fourteenth year. Pistor 
(1939, 1940) and Bromberg (1938) 
also point out that the time concept 
develops late in childhood, but not 
before the age of 10 or 12. Children 
first understand time as being num- 
bers on a clock, daily occurrences, 
etc., until finally the adult notion of 
time is attained. 

A lack of agreement is also to be 
noted in many studies of the evolu- 
tion of specific types of temporal con- 
cepts. In a rather detailed and sys- 
tematic investigation of the emerg- 
ence of the concepts of past, present, 
and future, Ames (1946) found that 
such temporal notions “come into use 
in a relatively uniform sequence from 
child to child and at about the same 
relative time in the life of every 
child.” An analysis of data from di- 
rect observation and questionnaires 
revealed that between the age of 18 
to 24 months the child lives pre- 
dominantly in the present, even 
though some ability to project into 
the future may be developed during 
this period. ‘‘Words indicating the 
present come in first, then words in- 
dicating the future and finally those 
indicating the past. Thus ‘today’ (24 
months) precedes ‘tomorrow’ (30 
months) which in turn precedes ‘yes- 
terday’ (36 months)” (Ames, 1946, p. 
122). From two to three years of age 
the child begins to utilize the con- _ 
cepts of past, present, and future in 
his verbalizations, but with a greater 
emphasis upon the future than on the 
past. An increase in the number of 
references to these time divisions 0C- 
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curs between age three and age four, 
but this period is also accompanied 
by some confusion between future 
and past activities. Nonetheless, 
Ames reports, this age brings with it a 
much greater Projection into the fu- 
ture. At age five, the days of the week 
are used appropriately. At six, the 
four seasons are understood. The 
concept of the month is known at age 
Seven, and by age eight, even ex- 
tremes of the time span can be 
handled adequately. In general, the 
results of Schechter, 
Bernstein (1955) tend * to support 
these findings, 


The use of words denoting tem- 
poral 


At age seven, 
tion of 


Seasons, etc., the notion of objective 
and continuous time, i.e., clock time 
is acquired, 


Similar observations, based on less 
Systematic data, were made by Brom- 
berg (1938), who States that “The 
development of the time sense (dura- 
tion?) does 


age of 5 or 6, and develops slowly 
until about the age of 10 or 12” (p. 
147). Another interesting observa- 


tion, by the same author, which ap- 
pears somewhat contradictory to 
that implied by Ames concerning in- 
dividual differences, was: “Although 
the age of maturation depends on the 
intelligence, the formation of the time 
concept seems to occur at the same 
relative time in the life of every child. 
In other words, the development of 
abstract concepts such as time seems 
to meet a certain social need.” Yet, 
it is doubtful whether the uniformity 
of the age at which temporal concepts 
appear is a defensible notion. Buck’s 
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Time Appreciation Test (1946) and 
its comparatively high correlation 
with intellectual level in addition to 
other findings relating time concepts 
to systematic instruction (Friedman, 
1944, 1945) would seem to contradict 
this notion, 

Springer's (1952) findings with 
older children (four to six) gen- 
erally corroborate those of Ames with 
respect to the 
velopment of the understanding of 
i She summarizes her findings: 
“First, the child is able to tell time of 
activities which occur regularly in his 
daily schedule. . . - Second, the child 
is able to tell time by a clock ET 
Third, he is able to set theclock ... , 
Fourth, he is able to explain why the 
clock has two hands and how each 
Operates....” (p. 95). Further sup- 
Port for this position with children of 
high L.Q,, between the ages of six and 
Seven, is presented by Farrell (1953). 

It is interesting to ‘note that there 
is considerable Similarity between the 
early temporal concepts of young 
children and the use of time among 
the primitives described by Werner 
e young child, at first, also 


deals with discrete events to which 


he attaches 


differing somewhat 


from those found by Ames are re- 
ported in a udy by Friedman - 
(1943-1944) In testing 697 children 


school (including kin- 
dergarten Pupils), this experimenter 
utilized a technique which was com- 
posed of 12 events, Each event had 
to be placed in one of four Categories: 
a long time ago, a short time ago, a 
short time to come, and a long time 
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to come. It was found that the older 
pupils (Grades 4-6) showed a great 
improvement over the younger pupils 
in the ability to understand these in- 
definite concepts. On the basis of his 
findings, Friedman (1943-1944) con- 
cluded: “There is not so much logic in 
the child’s thinking concerning the 
future as in his thinking concerning 
the past... . The idea of tomorrow 
appears to be less definite than is the 
concept of yesterday” (p. 340). 
According to Eson (1951), the 
child of eight is primarily concerned 
with the present and projects into the 
future only on those few occasions 
when he is stimulated to anticipate a 
future event. His conception of the 
past is also fairly restricted, being 
limited to those experiences which 
he has had, but whose consequences 
are still pending. Eson also investi- 
gated what he termed ‘‘temporal em- 
phasis” or the stress given to the past 
and the future in the thoughts and con- 
versations of his Ss. Seven groups of 
adults and children, representing five 
different age ranges, were utilized in 
the test of the hypothesis that the 
ranges of time perspective increase in 
the direction of both past and future 
in fairly equal proportions with ad- 
vance in age. The Ss were asked to 
list those items which they had 
thought about or spoken about dur- 
ing the two-week period preceding 
the interview. An analysis of some 
of these data in terms of its past or 
future reference reveals that each of 
the groups placed greater emphasis 
on the future than on the past. 
Gilliland and Humphreys (1943) 
have compared fifth graders with 
adults, with respect to ability to esti- 
mate short intervals of time (9-180 
seconds). The results indicate the 
superiority of the adults over the 
children by 15 to 18%. Hence, the 
implication here is that with respect 
to estimation of time, learning still 


takes place between preadolescence 
and adulthood. It seems that the in- 
ternalization of the notion of time 
and duration in relation to objective 
standards is as yet incomplete in late 
childhood. 

A more recent study (Smythe & 
Goldstone, 1957) found that six- and 
seven-year-olds are extremely vari- 
able in their estimates of one second 
when compared with older children, 
8 to 14 years of age. Moreover, the 
younger children do not seem to im- 
prove nor do they learn “from specific 
time information,” whereas the older 
children tend to show improvement 
with the aid of such information. The 
accuracy of estimation of one second 
by 14-year-olds did not differ appre- 
ciably from that of adults. 


Summary 


On the basis of the findings re- 
ported above, it appears that an in- 
dividual’s concept of time emerges 
early in childhood and develops 
gradually. By the time a child is two 
or three years old, he has acquired a 
notion, more or less limited, of a past, 
a present, and a future time, but un- 
til the eighth year, the child is pri- 
marily concerned with his immedi- 
ate present. The time concept, with 
ever widening past and future refer- 
ences, continues to develop through 
the thirteenth or fourteenth year 
when the adult concept first emerges. 
At that time the notion of continuity 
of time and its relatively accurate es- 
timation are reached. 


THE PASSAGE OF TIME 


A number of experiments on time 
perception and time estimation have 
been reported since the reviews by 
Gilliland et al. (1946) and by Wood- 
row (1951). As a result of consider- 
able variation both in definition of 
concepts and in methodology, the 
results obtained are contradictory 
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and confusing. An analysis of meth- 
odological procedures Proposed by 
Clausen (1950), reenforced with addi- 
tional detail by Bindra and Waks- 
berg (1956), attempts to put some 
order in the field of experimentation 
with the passage of time or duration. 


Methodology 


These investigators have pointed 
to four major methods by means of 
which short time intervals are judged 
in estimation of time experiments. 
These methods are: (a) verbal estima- 
tion—when E presents an interval 
(standard) and S is asked to respond 
verbally in terms of temporal units 
etc.) indicating 
the duration of the interval; (b) pro- 
S to produce an 
ength indicated 
verbally by E; (0) reproduction—R, 

y means of some operation presents 
an interval (standard) and S is ex- 


Two broad methods may be dis- 
tinguished with respect to the presen- 
tation of the “standard” to be 
judged: 


1, “Empty” Time. This refers to the time or 
interval which is bounded by two stimuli, 
such as two clicks (Ross & Katchmar, 1951), 
two flashes of light (Abbe, 1936; Fox, 1952), 
and others (Bakan & Kleba, 1957). Criticism 
of this method stems from Kowalski (1943) 
who points out that “subjects tend to incor- 
Porate into the time interval these limiting 
Sounds, which themselves take an appreciable 
length of time.” This criticism is particularly 
pertinent when very brief intervals are under 
consideration, The study by Rodgers and 


Hammersley (1954) js also relevant in this 
connection. 


Harris, 1952; Phillip, 1944 
(Cooper, 1948), light (Kowalski, 1943; Phillip, 


1945), and by a variety of assigned tasks 
and/or activities (Coheen, 1950; Filer & 
Meals, 1949; Hindle, 1951; Kawasima, 1937; 
and others). 


Clausen (1950) attempted to com- 
pare the several methods of interval 
presentation under conditions of filled 
and unfilled time. Using intervals of 
5, 10, and 15 seconds, Clausen found 
no significant differences between the 
estimation of filled and unfilled inter- 
vals. He also found that the method 
of verbal estimation results in less ac- 
curacy (overestimation) than the 
other methods employed and that 
the method of reproduction is less 
consistent and less reliable. He 
makes the important point that the 
methods of verbal estimation and 
Production deal with the relation of 
subjective time to “world time” 
(clock time) whereas the method of 
reproduction does not. Different 


functions are apparently involved. 


Time Estimation Under Different Con- 
ditions 


a and 
Waksberg (1956), an attempt will be 


from being uniform. 

Verbal estimation, Most of the in- 
vestigations on the perception of time 
have utilized this method of presen- 
tation. Postman (1944) had 40 un- 
dergraduates estimate three-, five-, 
and seven-minute periods while en- 
gaged in several different activities 
(addition, cancellation, and comple- 
tion). In general, there was a TAS 
ency to overestimate the actual inter- 
vals. The kind of task with which the 
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intervals were filled made no differ- 
ence in the estimation, but a posi- 
tional effect was noted. In Loehlin’s 
(1956) study one- and four-second 
intervals and two- and twenty- 
minute intervals were employed with 
a similar population. The shortest 
and longest intervals were clearly 
overestimated, the four-second inter- 
val was not, whereas, the two-minute 
interval was overestimated. The 
trends were determined in terms of 
mean ratios of estimated to actual 
time. This method is different from 
the one in the previous study in which 
the accuracy of estimates was re- 
ported in terms of deviation and may 
partly account for the somewhat dif- 
ferent results. It is interesting to 
note that Loehlin was unable to ob- 
tain a general factor of time estima- 
mation by means of his factor analy- 
sis of the data. Lhamon and Gold- 
stone (1956) compared normals and 
schizophrenics on the estimation of 
one second, presented as a continu- 
ous tone. Both groups tended to be 
stable in their estimates; both groups 
overestimated, though the schizo- 
phrenics were more extreme in this 
respect. The same method was em- 
ployed in another study (Smythe & 
Goldstone, 1957) where it was found 
that the tendency to “overestimate 
the value of 1.0 second is character- 
istic of all age groups.” A somewhat 
surprising finding is reported in still 
another study (DE Rezende, 1950) in 
which intervals of 2 and 12 seconds 
were used. A tendency to overesti- 
mate the longer intervals and under- 
estimate the short ones is reported 
and seems to be inconsistent with 
many of the studies previously re- 
viewed. In a series of experiments 
(Roelofs & Zeeman, 1951) intervals 
between 400 and 3200 milliseconds 
were used. Order, and filled vs. un- 
filled conditions were considered in 


detail. At 3200 ms. order was more 
important than the contents of the in- 
terval, at 400 ms. it was less impor- 
tant. Empty intervals of 1800 ms. 
were underestimated when compared 
with filled ones (continuous light). 
No important differences were ob- 
tained between the 3200 ms. inter- 
vals and the others. Some of the 
practical aspects of the errors in the 
estimation of short intervals were 
considered by Culbert (1954). 
Production. Eson and Kafka 
(1952) who noted some “diagnostic 
implications” in the study of time 
perception had their Ss (college 
students) produce 15-second and 
two-minute intervals under four con- 
ditions (silencing a tone, leaving a 
room in darkness, keeping the room 
lighted, and listening to a tone after 
switching it on). Most Ss underesti- 
mated the time intervals; the effect 
of serial position was to reduce the 
overestimation with succeeding judg- 
ments. The authors point out the in- 
dividual differences between their Ss 
with respect to stability of estimation 
and the experience of the rate of pas- 
sage of time. Using a 15-second in- 
terval, Falk and Bindra (1954) re- 
lated greater overestimation to the 
experience of anxiety. A combina- 
tion of the estimation and produc- 
tion methods is represented in Dob- 
son's (1954) study. Intervals of 17, 
38, 72 and 120 seconds were em- 
ployed under filled and unfilled con- 
ditions. The effects of filled vs. 
unfilled conditions upon estimation 
are “still open to question’’; also, as 
far as the estimation of the intervals 
is concerned “the tendency was for 
less accuracy and greater variation 
with increasing time intervals.” An 
additional variation of this study is 
the introduction of “non-set” condi- 
tions for time estimation. Subse- 
quent studies (e.g, Rabin, 1957) 
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also employed this method. Bakan 
(1955) found no significant differ- 
ences in the estimation of a one-hour 
work period between set and non- 
set conditions. 
Reproduction. This method was 
employed in Kowalski’s (1943) ex- 
periments with intervals ranging 
from .48 to 16.20 seconds. He also 
varied the delays between intervals 
to be reproduced. The durations of 
-48 and .92 seconds were consistently 
Overestimated, whereas the remain- 
ing intervals up to 16.20 seconds 
were underestimated. The final 
conclusion, based on analysis of 
variance, is that the durations of the 
intervals rather than those of the de- 
lays are the significant factors in un- 
der- and overestimation. This con- 
clusion appears to be in contradiction 
of Fox's (1952) findings reported at a 
later date. Such consistent differ- 
ences with respect to the stimulus 
were not obtained by Young and 
Sumner (1954) who experimented 
with a 30-second and five-minute in- 
terval. These investigators focus on 
the consistency of individuals jn 
their reproductions of the standard 
rather than on the characteristic 
(duration) of stimuli or the intervals 
between stimuli during presentation, 
A study (Phillip & Lyttle, 1945) 
which attempted to relate reaction 
time to the ability to reproduce short 
time intervals (.73-1.7 sec.) yielded 
negative results. 
Comparison. This procedure has 
been mainly used in the investiga- 
tion of the contextual effects upon 
the estimation of time. These studies 
(Phillip, 1944, 1947; Postman & 
Miller, 1945) demonstrate the anchor 
effect upon the estimation of very 
brief intervals (usually, fractions ofa 
second to about two seconds). Addi- 
tional, more recent studies (Wright, 
Goldstone, Boardman, & Lhamon, 


1958; Weinstein, Goldstone, & 
Boardman, 1958) have also demon- 
strated the anchor effect in temporal 
estimation—the “short anchor pulled 
judgments down and the long... 
up” (Goldstone, Lhamon, & Board- 
man, 1957), a finding expected in 
temporal estimation studies but dif- 
ferent than that found in traditional 
research involving anchor effects. A 
study by Turchioe (1948) concludes 
that pronounced central tendency 
effects for all intervals (.78, 1.01 and 
1.39 seconds) were obtained. 

By means of fractionation of tem- 
poral intervals, Gregg (1951) devised 
a “scale of subjective time” (“temp 
scale”), and Ross & Katchmar (1951) 


using a similar method derived indi- 
vidual “chron-scales,” 


Additional Experimental Factors 


Numerous investigators (e.g., 
Fraisse, 1948) have noted that per- 
ceptual experiences in a variety of 
Sense modalities have their effects 
upon “time perception” or time esti- 
mation. Some Studies (Fraisse & 
Oleron, 1951; Oleron, 1952) have 
concluded that the increase of the 
intensity of sound during an interval 
causes overestimation, Dependence 
of time judgment, in the young 
child, upon Space and speed of moy- 
ing objects was reported by Fraisse 
and Vautery (1952). The relation- 
ship of “seeming duration of time 
since different past events” to the 
“chronological time since these 
events” has its unique spatial repre- 
Sentation (Cohen, Hansel, & Syl- 
Vester; 1954). Cohen (1954) also 
discussed the “Kappa-effect” in re- 
lation to judgment of duration. 
Kinesthetic cues (counting aloud) 
have been reported to cause the over- 
estimation of a one-second interval 
(Goldstone, Boardman, & Lhamon, 
1958) in children and young adults 
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(but not in older Ss). The contextual 
aspects of time judgment (figure- 
ground) were reported by Iacono 
(1952b) who also doubts that the 
very brief intervals studied are ac- 
tually experiences of duration 
(Iacono, 1952a). Finally, the gestalt 
aspects of the estimation of time and 
its dependence upon sense modality 
(such as vision and audition) were 
illustrated by Nakajima (1951). 
Errors of estimates of time via visual 
stimuli were mostly negative, whereas 
the opposite is true with auditory 
ones. These configurational and 
patterning features were also empha- 
sized in some of the investigation 
quoted in earlier section (Roelofs & 
Zeeman, 1949, 1951). 


Summary 


Several methods of presenting in- 
tervals for estimation in time experi- 
ments have been distinguished. Some 
evidence points to lesser accuracy of 
the verbal estimation method as com- 
pared to the others. The issue of 
filled vs. unfilled time still remains 
unsettled, though the bulk of the 
evidence shows little significant dif- 
ference in time estimation under the 
two conditions. Perhaps the lack of 
significant results is due to the fact 
that the distinction between filled vs. 
unfilled time is in the mind of the £ 
rather than in the experience of the S. 

Generally, very brief periods (sec- 
ond, or fraction thereof) tend to be 
overestimated, while longer ones are 
underestimated. Positional and an- 
chor effects in brief interval experi- 
ments have been demonstrated and 
have probably been confounding 
factors in a number of experiments 
which have apparently produced 
contradictory results. The differen- 
tial effects on judgment of stimuli of 
different modalities, used in delineat- 
ing intervals of time have been 


pointed out again by recent investi- 
gators. This is especially true in the 
judgment of brief periods the dura- 
tional reality of which seems to be 
questioned. 

The studies considered in this sec- 
tion have either concentrated on the 
stimulus conditions or deal with such 
conditions as a part of their investiga- 
tion. One may agree with Adler 
(1954) that despite the recognition 
of the subjective factors in “time 
perception” many of the investi- 
gators “have not been able to free 
themselves from the stimulus as 
given and ‘real’.”” Perhaps the 
greater emphasis on the phenomeno- 
logical viewpoint, such as has been 
suggested (Cohen, 1954) will be more 
productive in contributing to further 
understanding of the processes of 
temporal experience or the hitherto 
elusive “time sense.” 


PASSAGE OF TIME AND PERSON- 
ALITY VARIABLES 


An area of recent concern to in- 
vestigators has been the study of 
temporal phenomena in relation to a 
number of personality variables. Al- 
though the discussion of reports in 
this area may overlap to some extent 
with other sections, especially the 
one on time perspective, a sufficient 
amount of work has been done to 
merit a separate rubric. 

Rapaport (1950) suggested that 
“time experiencing appears to be 
itself a phenomenon dependent on 
affective life,” a view also supported 
by Clauser (1954) and Fredericson 
(1951). Rapaport further proposes a 
connection between the experience 
of time and temperament. Such a 
relationship may be exemplified in 
the differences among people in their 
readiness to act upon an idea or im- 
pulse or their tendency to delay, to 
postpone, to “‘take more time.” 
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Several studies attempting to re- 
late the estimation of time to some 
personality variables are available. 
The unconscious conception of par- 
ents as dominant was found to be 
related to overvaluation (overesti- 
mation) of periods of one minute and 
higher (Fisher & Fisher, 1953). An 
MMPI study of 71 Ss with high T 
scores on individual scales (Solomon, 
1950) showed that those scoring high 
on one or more scales (excepting 
those on the manic scale) tended to 
make high estimates of periods rang- 
ing from 30 to 190 seconds. Also, the 
tendency for manic individuals to 
experience time more quickly than do 
depressed individuals was brought 
out. One study, mentioned earlier 
(Loehlin, 1956) reports an attempt 
to throw into the hopper measures of 
time estimation, questionnaire data 
(on phenomena of time Perception) 
and MMPI scores, and then to factor- 
analyze them all. The author claims 
to have obtained at least “scattered 
information relevant to the integra- 
tion of time perception into the ma- 
trix of personality.” At least the in- 
formation obtained is “scattered” 
since it is difficult to extricate any 
clear relationship between temporal 
experience and personality variables 
from the factors extracted in this 
investigation, However, the merits 
of this attempt along relatively un- 
charted lines cannot be denied. 

Motivation as a factor in 
estimation has been 
some extent. 


time 


words), They 


takes 10 min- 
utes. The first group y 


t vas to be dis- 
missed from class at the end of the 


10 minutes, while those of the sec- 
ond group who had written the 150 
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words in the time allowed would get 
a box of chocolates. The control 
group was told that they would re- 
turn to the regular classwork upon 
the termination of the task. All 
groups were interrutped at the end of 
4’ 37” and asked to estimate the time 
that had elapsed. Both experimental 
groups overestimated the time when 
compared with the control group. 
According to the authors “an attrac- 
tive goal affects the psychological 
distance to the goal,” i.e., time passes 
more slowly (and is therefore over- 
estimated) when Ss are motivated to 
have time pass. Another, related 
study (Burton, 1943) reports no con- 
sistent or significant findings with 
respect to overestimation of time 
under conditions of monotony (pre- 
sumably, no motivation for the task 
but motivation for time to pass 
quickly). A trend in the direction of 
overestimation under conditions of 
monotony was noted, however. 
Hindle’s (1951) investigation of 
“Time estimates as a function of 
distance traveled and relative clarity 
of a goal” is more clearly related to 
the one by Filer and Meals (1949), 
The findings of this investigator indi- 
cate that “during the latter portion 
of an activity, when it leads to a 
clearly defined goal, estimates of 
time spent increase more slowly with 
increments in score than estimates 
with no defined goal” (Hindle, 1951, 
p. 501). ne may note that the 
group with “no defined goal” was 
subject to a greater feeling of monot- 
ony and therefore gave higher esti- 
mates. On the other hand, it would 
appear that these results are incon- 
sistent with those of Filer and Meals 
who report greater overestimation of 
those who were approaching an 
‘attractive goal.” Such an attractive 
goal was actually not part of Hindle’s 
complex design. The “goal” in her 
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experiment was mainly the comple- 
tion of the task (“no reward”). Her 
subjects, therefore, were not so mo- 
tivated to have “time pass quickly.” 
The definition of “goal,” incentive 
and motivation certainly vary from 
experiment to experiment. The re- 
sults and conclusions are, therefore, 
not comparable. The “motivational” 
aspects in Hindle’s study are in 
doubt, although the relative lack of 
motivation (monotony?) in the “no- 
goal group”? may be a distinct pos- 
sibility. 

Under conditions of stress intervals 
of time tend to be overestimated ac- 
cording to the findings of Falk & 
Bindra (1954). The experimental 
group in this study received a shock 
at the end of some of the trials in 
producing a 15 second interval, while 
the control group did not. The au- 
thors conclude that ‘greater over- 
estimation of the interval by the ex- 
perimental group... is interpreted 
in terms of an anxious set induced by 
expectation of shock.” A related 
study (Henrickson, 1948) of the 
“judgment of speaking time” among 
students suffering from various de- 
grees of "stage fright” (anxiety?) 
reports no significant findings with 
respect to overestimation of temporal 
intervals. 


Summary 

The relatively few studies cited in 
this section indicate that the study of 
relationship between temporal ex- 
perience (perception-estimation) and 
personality variables is still in its 
infancy. Variability in experimental 
design and terminology make it vir- 
tually impossible to compare various 
findings or to draw definite conclu- 
sions of a general nature from them. 
Some credence in a relationship be- 
tween the estimation of time and 
stress and motivation may be justi- 


fied by the findings reported in these 
studies. 


TEMPORAL EXPERIENCE IN 
PsYCHOPATHOLOGY 


Theoretical Views 


A good deal of clinical material, 
observational data and a review of 
similar reports by others concerning 
the disturbance of time experience in 
mental disorders is found in Schilder’s 
(1936) extensive discussion of the 
psychopathology of time. He related 
particularly the phenomenon of de- 
personalization to disturbances in 
the experience of time—a loss of im- 
mediate experience of time and of the 
sense of the word “time,” the feeling 
of the present as if it were the far- 
away past and inability to distin- 
guish between immediate and distant 
past. More recently Dubois (1954) 
discussed the sense of time in relation 
to motivation of human behavior and 
has considered temporal distortions 
to be basic to all types of mental ill- 
ness. Fischer (1929) maintained that 
there was no schizophrenic disorder 
which was not a space-time disorder. 
Others (Haranyi-Hechst, 1943; Kier- 
sen, 1951) suggest that timelessness 
and other temporal distortions are 
quite common in schizophrenia. For 
Minkowski (1926), extreme distor- 
tions of subjective time were the 
central symptoms of schizophrenia 
and were prominent in other psy- 
choses as well. Somewhat similar 
positions are held by Lewis (1931- 
1932) and by Strauss (1947) who 
points to temporal distortions in 
depressive states. Other writers 
(Federn, 1952; Hollos & Ferenczi, 
1925) note the relationship between 
abnormal temporal experience and 
psychosis. Van Der Horst (1932) at- 
tempted to explain Korsakoff’s syn- 
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drome as the loss of temporal aspects 
of experience. 

According to Werner (1948) the 
reports of patients with temporal dis- 
turbances reveal a much greater ego- 
centricity and concreteness in their 
utilization of temporal concepts than 
is common in the normal person. 
Since such temporal experiences are 
similar to those found in primitive 
peoples and children, Werner postu- 
lates that psychopathologically dis- 
turbed individuals reveal a (tem- 
poral) regression in which the normal 
adjustment of personal time to world 
time no longer is carried through 
completely, 

Psychoanalytic writers also sup- 
port the notion of regression in their 
discussions of disturbances in tem- 
poral concepts found in many mental 
disorders. In many of these views 
specific types of temporal distortions 
are associated with difficulties en- 
countered at early levels of psycho- 
sexual development. Yates (1938) 
points out that prolonged disharmony 
between the appearance of infantile 
needs and their satisfaction espe- 
cially in the oral period, may give 
rise to a loss both of the awareness of 
the passage of time and of sense 
of reality. Other writers (Bergler & 
Roheim, 1946; Scott 1948; Schneider, 
1948) also note the close connection 
between disturbances in the appre- 
ciation of reality and temporal distor- 
ions. Fenichel (1945) and others 
(Schilder, 1936; Dooley, 1941; and 
Arieti, 1955) suggest that fixations at 
the anal level, said to be primary in 
the later development of obsessive- 
compulsive „neurosis and Paranoid 
schizophrenia, also are characterized 
by particular disturbances in tem- 
poral conceptions, Oberndorf (1941) 
discusses temporal distortions origi- 
nating at both the oral and anal pe- 
riods of personality development. 
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Temporal difficulties in neurosis are 
also noted by Wolff (1950). 

Summary. In general, the preced- 
ing accounts agree in suggesting that 
(a) disturbances in temporal experi- 
ence are related to emotional or 
neurological difficulties, and (b) at 
least some types of psychopathology 
are characterized by specific temporal 
disturbances. This view is perhaps 
best summarized by Schneider 
(1948), who states rather generally 
that “the manner in which a person 
handles time... is very closely 
linked to the structure of his char- 
acter and to the nature of his neuro- 
sis.” This view is also supported by 
Meerloo (1950). 


Empirical Findings 


For the most Part, little empirical 
validation of any of the preceding 
views is reported in the literature. 

n a well-conceived study of time 
perception in neurotic, schizophrenic, 
and “normal” Ss, Dobson (1954) 
found no significant differences 
among the mean estimates of 17, 38, 
and 72 seconds, although the schizo- 
phrenics revealed a greater variation 
in their estimates, Johnston (1939) 
reported that schizophrenics differed 
from normals notably in their esti- 
mates of 5-, 10-, 90-, and 100-second 
intervals but also in the variability 
of their estimates; they underesti- 
mated the short intervals and over- 
estimated, more than the normals, the 
longer ones. Rabin (1957) also found 
significant differences (bimodal distri- 
bution) between schizophrenics and 
nonpsychotics in judgments of longer 
Periods of time (one-half-hour to an 


hour). De La Garza and Worchel 
(1956) reported that schizophrenics 
were si 


gnificantly poorer than 
normals on time-orientation tests. 
Adler (1954) reported that schizo- 
Phrenics, as compared to psychopaths 


the 
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and normals, tended to overestimate 
short periods of time (30 seconds, 
minute, 24 minutes, and 3 minutes) 
spent on a number of tasks. Lhamon 
and Goldstone (1956) demonstrated 
greater overestimation of one second 
by schizophrenics than by normals. 
Also it has been found that schizo- 
phrenics are less subject to remote an- 
chor effects (Weinstein, Goldstone, & 
Boardman, 1958; Wright, Goldstone, 
Boardman, & Lhamon, 1958) than 
are normals. 

Coheen (1950) reported that the 
degree of agnosia in organic patients 
was roughly proportional to degrees 
of deterioration. Fraisse (1952), com- 
paring functional psychotics, organic 
psychotics and aphasics, found that 
patients with functional psychosis 
appear to have the greatest difficulty 
in perceiving and reproducing a short 
interval of nonfilled time. 

Brower and Brower (1947) studied 
the relationship between temporal 
judgment and social competence in 
the feeble-minded and reported that 
mental ability has more influence 
on time orientation and estimation 
than social competence. A close rela- 
tionship between the knowledge of 
time and the Stanford Binet in de- 
fective children was obtained in an- 
other study (Gothberg, 1949). Both 
studies provided evidence concerning 
the relative primitiveness of the con- 
cept of time in feeble-minded chil- 
dren. 


Summary 

Although a number of studies of 
distortions in temporal experiences as 
revealed in mental disturbance have 
been reported in recent years, the 
total number of investigations in this 
area is relatively small. For the most 
part, little light has been shed on the 
nature of the processes which underlie 
the manifestations of temporal dis- 


tortions in psychopathological condi- 
tions. Considerable future work, 
both in terms of theoretical formula- 
tion and experimentation will be 
necessary before substantial progress 
can be reported in this area. 


TEMPORAL EXPERIENCE UNDER 
SPECIAL CONDITIONS 


Other research efforts concerned 
with temporal phenomena have in- 
volved the study of temporal experi- 
ence under a variety of conditions. 
Included among the topics investi- 
gated are changes (distortions) in 
temporal experience as a result of the 
effect of various drugs and hypnosis. 


Drugs 


A perusal of the literature reveals 
that the number of research reports 
in this area, since the review by Gilli- 
land et al. (1946), is rather small. 
This review reported the extreme 
tendency to underestimate and over- 
estimate time under mescal intoxica- 
tion. A similar effect of marijuana 
was also reported. Quinine and alco- 
hol caused underestimation, while 
caffeine and thyroxin yielded incon- 
sistent findings. 

More recently, nitrous oxide was 
shown to influence subjective impres- 
sions of changes in rate of passage of 
time with Ss revealing both quicken- 
ing and slowing experiences (Stein- 
berg, 1955). Another study with 
marijuana, but with no new conclu- 
sions, was reported (Bromberg, 1941). 
Neither has LSD (Lysergic Acid Di- 
ethylamide),which is supposed to pro- 
duce schizophrenic-like symptoms, 
yielded clearcut effects upon the esti- 
mation of short time intervals (Board- 
man, Goldstone, & Lhamon, 1957). 
Another mescal report by Wolfe 
(1952) added little to the findings 
previously reported. 
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Hypnosis 


A review of studies of the percep- 
tion of time under hypnosis was pub- 
lished by Loomis (1951) who con- 
cluded that hypnotic subjects were 
sensitive to temporal factors and were 
able to estimate time more accurately 
than nonhypnotized persons. Coo- 
per (1948, 1950) reported that the 
time sense may be deliberately dis- 
torted under hypnosis so that sub- 
jects may report even extremely 
brief intervals as long ones. Cooper 
and co-workers also reported that 
temporal distortion is found in sub- 
jects performing tasks involving mo- 
tor (Cooper & Tuthill, 1952) and non- 
motor (Cooper & Rodgin, 1952) 
learning. Other reports (Cooper & 
Erickson, 1950; Cooper, 1952; Eisen- 
bud, 1947; Gross, 1949) have sug- 
gested that temporal distortion is 

found in dreams as well, 


Summary 


There seems to be some evidence 
that a number of drugs have a dis- 
torting effect upon the experience of 
time. However, the effects do not 
appear to be clear-cut or unidirec- 
tional in all instances, Hypnosis and 
dream states also have their effects. 
Hypnotic suggestion may give direc- 
tion to the distortion. As to dreams, 
subjective reporting is the only source 
of evidence and may be fallible, 


PHYSIOLOGICAL CORRELATES OF 
THE EXPERIENCE OF TIME 
Theoretical Views 


A major focus in this 
volved efforts to specify 
physiological locus 
sense.” Contribution 
range from the view 0 
who states that “the 
temporal organization . . , js not lo- 
calizable to some one portion of the 


area has in- 


brain” to directly antithetical posi- 
tions espoused by Schilder (1936), 
that the parietal lobe and adjacent 
parts of the temporal lobe are the 
specific site of the experience of time, 
and Campbell (1954) who suggests 
that the organization of the cerebral 
cortex can be divided into past and 
future orientation. 

Other writers have pointed to the 
association between time sense and 
other physical and physiological fac- 
tors. A number of reports in this area 
have been reviewed by Gilliland et 
al. (1946). Davidson (1941) proposes 
that the development of the time 
sense parallels the process of mye- 
linization. Hoagland (1933, 1934, 
1943) postulated the existence of a 
chemical brain clock in which the 
judgment of time is attributable to a 
master chemical reaction involving 
the respiration of certain parts of the 

rain, a view also held by Piéron 
(1936). Others Propose that time 
Perception is based upon body me- 
tabolism and physiological factors, 
and Lecomte Du Nuoy (1937, 1939, 
1944) has suggested quantiative bio- 
logical methods of measuring the in- 
dividual time sense. MacKenzie and 
Munsterberg, as reported by Eson 
(1951), also Suggest that time per- 
ception is based upon rhythmical 
physiological processes, 


Empirical Findings 


On the basis of evidence available 
until 1946, Gilliland, et al. (1946) 
concluded that physiological hypoth- 
eses were inadequate as an explana- 
tion of the Perception of time and 
that the most significant factors in 
the estimation of time were external. 
The number of publications in this 
area subsequent to 1946 has been 
negligible and has Presented no ex- 
perimental evidence to refute Gilli- 
land’s conclusion. It seems apparent, 
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however, much research remains to 
be done before a definitive theoretical 
framework can be developed to ac- 
count for the relationships between 
internal physiological factors and ex- 
ternal factors in the estimation of 
time. 


Tue Concert OF TIME 
PERSPECTIVE 


Theoretical Considerations 


In addition to a rather sizable num- 
ber of studies regarding temporal 
concepts such as time sense and per- 
ception of time, a recent focus of at- 
tention has involved what has been 
termed time perspective. As far as can 
be determined from a perusal of the 
literature, Frank (1939) was the first 
to present a detailed consideration of 
this idea and to speculate about its 
relation to human conduct. Accord- 
ing to this view, culturally deter- 
mined attitudes about notions of 
temporality constitute one major as- 
pect of the influence of culture upon 
behavior. He pointed out that tem- 
poral factors not only figure promi- 
nently in demarcating culturally ac- 
ceptable times of occurrence for spe- 
cific life-phenomena but also underlie, 
albeit implicitly, the transmission of 
cultural standards and mores, since 
parental child-training procedures, 
serving to accomplish this end, are 
subtly embedded in a temporal di- 
mension. In addition to including 
such considerations within the frame- 
work of an explanatory formulation 
of the development of time perspec- 
tive, Frank also described, within the 
context of a field-theoretical ap- 
proach, the dynamic interplay be- 
tween acquired conceptions of the 
past and the future, especially in 
terms of their everchanging influence 
upon notions of the presenta topic 
also discussed by Perls (1947), Mowrer 
and Ullman (1945), and others. 


In an approach which may also be 
characterized as field-theoretical, Le- 
win (1946, 1951) was concerned with 
time perspective as a dimension of 
the “life space” and refers briefly to a 
formulation which might account for 
its development. In another refer- 
ence to this topic, Lewin (1942) sug- 
gests that the social environment in 
which an individual lives may ma- 
terially influence his time perspec- 
tive—a view also discussed by Smith 
(1952). If a person lives in a social 
environment which is autocratic, for 
example, the future is decided and 
structured by the leader. An individ- 
ual in a “democratic” group, on the 
other hand, takes an active role in 
determining both his own and the 
group’s plans. In a somewhat re- 
lated view, Hulett (1944) proposes 
that group experiences in general pro- 
vide contexts for learning to antici- 
pate the social roles of others, and 
therefore have an important bearing 
upon individual time perspectives. 
Ketchum (1951) also examines the 
relation between temporal factors 
and social perception. 

For Rapaport (1946), the notion of 
anticipation is associated with a se- 
quence of events extending into fu- 
ture and as such is intimately related 
to time perspective and the general 
problem of time experience. Arieti 
(1947) also discusses anticipation and 
its role in personality development 
and psychopathology. 

Research Findings. Eson (1951) in- 
vestigated the emphasis given to the 
past and the future in five groups of 
subjects of different ages. Each sub- 
ject was asked to determine the tem- 
poral focus of at least 25 items de- 
rived from thoughts and conversations 
during the previous two-week period. 
On the basis of the data obtained in 
this manner, Eson concluded that 
each of the five groups tested reveale 


228 MELVIN WALLACE AND ALBERT I. RABIN 


an orientation toward the future. 
Other measures of time perspective 
have been derived from data obtained 
by means of questionnaire and inter- 
view techniques. In one study con- 
cerned with the relationship between 
time perspective and morale in a 
group of prison inmates, Farber 
(1944) selected 28 variables from a 
wealth of case study and interview 
material. By means of correlational 
techniques, the relationships among 
these factors were evaluated, and it 
was concluded that the degree of pain 
and suffering in the prison situation 
correlated most highly with “factors 
involving time Perspective, particu- 
larly the future outlook... ," and 
not with the actual length of sen- 
tence or time already served—a find- 
ing related to that reported by Field 
(1931). On the basis of two other 
studies, involving college students 
and also utilizing questionnaire and 
other direct information-getting tech- 
niques, Farber (1951, 1953) reported 
that an individual’s current mood is 
more influenced by the Psychological 
future time Perspective than the 
situation in which he finds himself, A 
somewhat related finding was re- 
ported by Israeli (1933-1934) on the 
basis of questionnaire material ob- 
tained from college students. Israeli 
(1934, 1935a, 1935b) also investi- 
perspective of unem- 
al patients and 


tally ill. 
that the latter investi 
duced the method of 

raphy (Israeli, 1936) and ot 
niques (Israeli: 1932, 1933_ 


future, 


In attempts to obtain data regard- 
ing time perspective which might be 
less subject to conscious control, sev- 
eral techniques, characterized by 
relatively unstructured or projective 
stimulus situations, have been uti- 
lized. LeShan (1952), for example, 
employing a Tell-Me-A-Story tech- 
nique, found evidence to support the 
hypothesis that the time perspectives 
of middle-class children are longer 
than those which characterize lower- 
class children. Other investigators 
(Barndt & Johnson, 1955) report that 
delinquent boys reveal shorter spans 
of time perspective and are more 
present-oriented than nondelinquents 
on a story-completion technique. A 
later investigation (Davids & Parenti, 
1958) with the Same story comple- 
tion method found no differences be- 
tween disturbed and normal 11-year- 
olds with respect to time perspective 
(“present oriented”). The normal 


were more oriented in 
than the normal adolescents in the 
Barndt & Johnson (1955) investiga- 
tion but were not different from the 
adolescent delinquent group in this 
respect. Expansion of the time per- 
spective may well be one aspect of 
maturation. 
Using a modification of the The- 
matic Apperception Test (TAT) as 
one measure of temporal orientation, 
Fink (1953) demonstrated that aged 
institutionalized individuals revealed 
a time Perspective which was more 
-oncemed with the past than that 
found in their honinstitutionalized 
contemporaries. Bonier and Rokeach 
computed the percentages of 
Past, present, and future times used 
in stories given by Ss to five TAT 
cards in a Study of the relationship 
between dogmatism and time per- 
spective and reported that Ss high in 
ogmatism, compared with low scorers 


on this variable, gave fewer present 
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and significantly more future re- 
sponses. In a study of future time 
perspective in schizophrenia, Wallace 
(1956) defined time perspective as 
“the timing and ordering of person- 
alized events.” Two aspects of fu- 
ture time perspective, termed exten- 
sion and coherence, were measured by 
means of a task in which S spontane- 
ously presented a number of antici- 
pated events and the possible age of 
occurrence of each of these events. 
Extension was defined as the length 
of the future time span which is con- 
ceptualized, while coherence refers to 
the degree of organization of the 
events in the future time span. A 
similar kind of conceptualization is 
provided by Arieti (1955), who dis- 
cusses the “restriction of the psycho- 
temporal field” and ‘“‘seriation func- 
tions” in schizophrenic disturbances. 
Cohen et al. (1954) were involved in 
a study of what might be termed ex- 
tension and coherence for past events. 
Teahan (1958) reported that children 
high in academic achievement re- 
vealed both greater reference to the 
future and more extensive future 
time perspectives than low achievers. 
Optimism was also related positively 
to future extension in this study. 

In another recent paper Levine 
and Spivack (1957) reported on the 
relationship between time perspective 
and the ability to achieve scholasti- 
cally despite the removal of an im- 
mediate incentive. These investi- 
gators employed many of the items 
of Wallace’s (1956) method in meas- 
uring what they termed “time con- 


ception.” 


Future Time Perspective and Psycho- 
pathology 

Theoreticalviews. A variety of writ- 
ers suggest that emotional disturb- 
ances may be characterized by ina- 
bility to conceptualize the future. 
Greenacre (1945) characterizes the 


psychopath as lacking time perspec- 
tive. Strauss (1947) states that de- 
pressed patients typically are unable 
to abandon the past and advance to- 
ward the future. According to Eissler 
(1952), depressed patients do not ex- 
perience the present as a bridge be- 
tween the past and future, but rum- 
inate and brood and view the future 
as if it were a “‘shapeless gap.” Lewis 
(1931-1932) reports that cases char- 
acterized by feelings of depersonaliza- 
tion and a sense of unreality reveal 
an inability “to look into the future 
or to anticipate a future for oneself.” 
Meerloo (1948, 1950), in focussing 
upon neurotic difficulties which in- 
volve conceptions of the future, calls 
attention to ‘‘anticipation neurosis.” 
Such conditions, Meerlo suggests, are 
related not so much to the specific na- 
ture of childhood conflicts and early 
traumatic experiences, as to repeated 
expectations that similar unpleasant 
experiences will occur again. 
Empirical findings. Of the many 
published accounts concerning tem- 
poral disturbances in mental dis- 
order, only a few appear to have 
some relevance to the concept of time 
perspective. In one of these ac- 
counts, which is concerned with the 
level of aspiration of one psychoti- 
cally depressed case, Escalona (1940) 
mentions that the patient’s time per- 
spective had become more limited 
and that his outlook was determined 
almost completely by his psychologi- 
cal present. Utilizing questionnaire 
and interview material to study fu- 
ture outlook, Israeli (1936) found 
drive with respect to future goals and 
the nature of anticipated possibilities 
served as a basis for differentiating 
normals and abnormals, melancholics 
and paranoid schizophrenics. Psy- 
chopathological groups were char- 
acterized by limited time perspec- 
tives “scarcely extending beyond the 
present or an attenuated perspective 
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comprising barely a few future possi- 
bilities” (p. 118). Wallace (1956) re- 
ported that future time perspective is 
influenced by the schizophrenic proc- 
ess to such an extent that both the 
length (extension) of the future span 
of time and the organization of its 
contents (coherence) are significantly 
reduced in patients as compared to 
normals. Adler (1954), in addition to 
the estimation of time experiments 
discussed in a previous section of 
the present review, devised an 81- 
item time perspective questionnaire. 
Many of the items differentiated to a 
statistically significant degree be- 
tween the normals, psychopaths and 
schizophrenics who served as Ss in 
his investigation. 
Findings on the 
tween future time perspective and 
anxiety, which appear contradictory 
to results cited earlier, were reported 
by Lipman (1957). Although Lip- 
man did not study psychiatric pa- 
tients but college students with high 
scores on the Manifest Anxiety Scale 
(based on the MMPI), he confirmed 
the hypothesis that anxiety“... en- 
tails experimental components that 
are strongly future-oriented in na- 
ture.” A factor analytic treatment 
of the nine scales related to future 
time perspective yields two independ- 
ent dimensions: “dismal unclarity”’ 


and “exaggerated goal frustration 
fears.” 


relationship be- 


Summary and Critique 
: The investigations 
tive, reviewed above 
toward significant fi 
any generalizations based upo 


cited earlier contai 
ferent interpretation of thi 
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In the reports of LeShan (1952), 
Bonier and Rokeach (1957), and 
Barndt and Johnson (1955) the con- 
cepts of time perspective and time 
orientation are used interchangeably, 
with the meaning of neither concept 
being explicitly specified. Both con- 
cepts are also mentioned in investi- 
gations by Eson (1951) and Fink 
(1953) but are utilized in somewhat 
discrete fashion, even though no spe- 
cific attempt to indicate their possi- 
ble interrelationships is made. Tea- 
han (1958) makes a distinction be- 
tween time orientation and time per- 
spective in his hypotheses concerning 
the relation of these variables and 
academic achievement. It is to be 
noted that the work of Eson (1951) 
and Fink (1953) in this area stems to 
a large extent from the theoretical 
Position of Lewin who defined time 
Perspective as “the totality of an indi- 
vidual’s views of his psychological 
past and psychological future exist- 
ing at a given time” (1951). Such a 
formulation illustrates some of the 
difficulties encountered in conceptual 
and methodological approaches to 
the study of time perspective, in that 
the terms included within the defini- 
tion require more Precise specifica- 
tion. Concepts such as psychological 
past and psychological future, for ex- 
ample, are subject to many interpre- 
tations, and the results of any em- 
pirical investigation of time perspec- 
tive derived from Lewin’s formula- 
tion are ambiguous unless adequate 


sented. 

that the 
pirical findings would be dependent 
upon the 
that were utilized. 


Sy 
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specific definitions chosen for any in- 
terrelated set of concepts describing 
time perspective would limit the 
methodological tools that could be 
employed. Even within the range of 
potentially utilizable techniques, 
however, the choice and utilization of 


- any device would in itself tend to re- 


strict the meaning of the concept 
under study. As Pratt (1948) has 
pointed out, any fact is a function of 
the method by which it is derived. 

In view of the above considera- 
tions, the studies discussed in the 
foregoing review of the literature re- 
garding time perspective might most 
logically be evaluated in terms of the 
methods, as well as the specific set of 
definitions, that were employed. 
Both the particular methodological 
techniques and the specific defini- 
tions of time perspective utilized in 
these investigations were highly di- 
versified. To a large extent, there- 
fore, the results reported are not com- 
parable, and attempts to draw broad 
generalizations might be seriously 
questioned. Nonetheless, in view of 
the relatively sizeable number of sig- 
nificant findings reported, it does 
seem appropriate to conclude that 
both the direct and indirect ap- 
proaches to the problem of time per- 
spective have been quite fruitful. 
Further systematic research de- 
veloped from a consistent theoretical 
and methodological point of view ap- 
pears necessary to provide a basis for 
obtaining additional definitive in- 
formation about the nature of time 
perspective and its relation to phe- 
nomena in other areas. 


GENERAL SUMMARY—TIME 
PERCEPTION AND TIME 
PERSPECTIVE 


The concept of time is a complex 
one. It ranges from relatively primi- 
tive notions of “before” and “after,” 


as in early infancy, and definite 
points or events, as in preliterate cul- 
tures, to the abstract idea of a con- 
tinuous flow which is divisible into 
equal units of duration. The latter 
idea, of course, is related to the ob- 
served periodicity in cosmic changes 
and movement. 

Attempts to localize the ‘‘time 
sense” cortically or in any functional 
physiological system of the body 
have so far received little experi- 
mental support. There appears to be 
more evidence which points toward 
the development of temporal experi- 
ence which parallels the evolvement 
of self or ego. This development be- 
gins early in life and continues with 
the increasing consciousness and dis- 
crimination of events which serve as 
the boundaries for the conceptual- 
ized periods of time. The relationship 
of the personal experiences and 
events to conventional units of time 
is learned and depends upon the cul- 
tural setting from which a person 
originates. The abstraction of time 
as continuously flowing which is 
boundless both in the past and fu- 
ture is a relatively late acquisition 
of the developing individual even in 
the most literate societies. 

The capacity to estimate, repro- 
duce or produce specific units of time 
is a judgmental process (Woodrow, 
1951) which is based on the experi- 
ence and consciousness of events and 
a reflection upon them in relation to 
previous experiences as related to 
conventional units of time. Like 
most judgmental processes, the judg- 
ment of time becomes impaired under 
a variety of conditions involving 
drugs, cortical changes and psychosis. 
Other, special conditions, such as ex- 
treme affective states and anxiety, 
may have similar impairing effects. 
The possibility should be considered 
that certain individual differences 10 
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the awareness and judgment of time 
exist and are due to certain develop- 
mental and experiential events in the 
organism. This relationship, how- 
ever, between temporal experience 
and personality factors is yet to be 
clearly demonstrated, 

A further expansion of the abstrac- 
tion of time is reflected in the con- 
cepts of time orientation and, 
particularly, “time perspective.” 
Whereas the experiments in the per- 
ception and estimation of time deal 
with relatively brief periods of time 
(usually seconds and minutes), time 
perspective is concerned with long 
periods of time, the limit of which is 
one’s expected lifespan itself. The 
units of time, in addition to direction 
(e.g. past, future, etc.), are usually 
days, weeks, months, years, and 
decades. Here the time periods are 
not artificially limited by E's produc- 
tion of stimuli; the “points” in time 
which delimit the periods are im- 
portant events in the life of each S, 
It follows, therefore, that time per- 
spective involves a molar rather 
than molecular (or atomistic) ap- 
Proach to the problem of temporal 
behavior, It involves the total per- 
sonality, memory for past events, 
and hopes, aspirations, and anticipa- 


tions of future events. The data 
which are obtained in a “projective” 
fashion involve temporal quantity 
(“extension”) as well as orderly ar- 
rangement of events in a logical se- 
quence (coherence), and direction or 
temporal “orientation” (whether pri- 
marily past, present, or future). Fi- 
nally, traditionally in perception ex- 
periments dealing with time, the 
stimulus or the conditions under 
which the stimuli were presented 
were the major focus of E's atten- 
tion. In time perspective, intra-in- 
dividual, rather then extra-individual 
conditions, are of main concern. The 
person’s “projection of the self in the 
temporal dimension,” as part of the 
uniqueness of personality, is of major 
interest in the study of time perspec- 
tive. 

It will be, of course, of particular 
concern to future investigators to 
work out in detail a theory of tem- 
poral behavior and subsequently re- 
late it to a variety of significant per- 
sonality variables. The more precise 
relationship between time perception 
and time perspective, although de- 
lineated conceptually, is in need of 
empirical and experimental under- 
pinning and substantiation, This is 
a task for the future, 
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There are many secondary sources 
which, during discussions of massed 
reinforcement, refer to postasymp- 
totic decrements in performance after 
extended training. One recent ex- 
ample of this is found in Deese 
(1958): “If, during conditioning, 
trials are massed very close together, 
the animals may slow their rates of 
responding and even stop altogether, 
even though the reinforcement is still 
present . . . ” (p. 53). Similar state- 
ments are made by Bugelski (1956, 
pp. 166, 287, 349), Deese (1952, p. 
56), Hilgard and Marquis (1940, p. 
146), Hovland (1950, p. 616) and 
Hull (1943, p. 189f). The tenor of the 
discussions is that under conditions 
of massed reinforcement, sufficient 
numbers of trials will eventually lead 
to a decrement in the strength of a 
conditioned response. Some authors 
have labeled the genesis of this phe- 
nomenon ‘inhibition of reinforce- 
ment,” a term coined by Hovland in 
his Science article of 1936. 

Hull has treated ‘inhibition of 
reinforcement” much more formally 
in his Principles of Behavior (1943). 
His thirteenth corollary (1943, p. 
189-191) is: “In the case of closely 
massed reinforcements, the curve of 
acquisition of effective excitatory 
potential . . . will be distorted by in- 
hibition of reinforcement . . . in ex- 
treme cases showing an actual fall 
with continued practice.” This theo- 
retical position is in essential accord 
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with the descriptions mentioned in 
the secondary sources cited above. 
In general, it would be predicted that 
continued, massed reinforcements will 
produce enough inhibition (built up 
with response evocation) to eventu- 
ally offset the effects of the reinforce- 
ments, thus resulting in a gradual 
performance decrement even though 
reinforcement is not withdrawn. Asa 
term, inhibition of reinforcement is 
not mentioned in Hull’s final system 
(Hull, 1952); however, as Gleitman, 
Nachmias and Neisser (1954) point 
out, the Hullian construct of condi- 
tioned inhibition clearly implies that 
performance in a learning situation 
will asymptote, then gradually de- 
cline with continued, massed rein- 
forcements. In fact, once perform- 
ance has reached asymptote, the 
decline should occur equally rapidly 
with or without continued reinforce- 
ments. 

We therefore have a situation 
where current secondary sources ac- 
cept, as demonstrated, performance 
decrements during the massed rein- 
forcement of a conditioned response 
and where a theoretical position pre- 
dicts these decrements on the as- 
sumption of inhibition building up 
each time a response is made without 
adequate time for it to dissipate be- 
tween trials. Deese (1958, p. 53) goes 
so far as to call this a ‘‘matter of ex- 
perimental fact.” It is the purpose of 
this paper to examine carefully the 
evidence for a performance decre- 
ment (as herein labeled for the rest of 
the paper) as a function of massed re- 
inforcements. Classical and instru- 
mental conditioning will be consid- 
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ered separately, to be followed by 
general comments and conclusions. 


Classical Conditioning Studies 


Historically, as might be expected, 
performance decrements were first 
observed by Pavlov (1927, p. 234f). 
Pavlov is cited in several secondary 
sources (Bugelski, 1956; Hovland, 
1950; Hull, 1943) as providing data 
which indicate performance decre- 
ments with massed reinforcements, 
and it is this proposition that will be 
examined. ; 

Several facts of the Pavlov data 
have much import for performance 
decrements and the conditions under 
which they might occur. First, condi- 
tioned responses learned to a condi- 
tioned stimulus of 30 seconds’ duration 
very often Spontaneously extinguish 
despite continued reinforcements, The 
characteristic of this response decre- 
ment and ultimate loss is its gradu- 
ally increased latency until it is tem- 
Porally coincident with the onset of 
the food (unconditioned stimulus), 
With further reinforcements, a test 
trial (conditioned stimulus alone) 
will not elicit the conditioned re- 
sponse even during the time the un- 
conditioned stimulus would ordinarily 
have been presented. The intertrial 
interval in these situations is usually 
at 10 minutes or more on any given 
day of experimentation, and the per- 
formance decrement itself may re- 
quire weeks, months or even years to 
show up. 

Second, when a conditioned re- 
sponse is established to a conditioned 
stimulus of 10 seconds’ duration, 
changing the duration of the condi- 
tioned stimulus to 30 seconds results 
in a loss of the conditioned response, 

Third, a conditioned response 
learned to multiple conditioned stim- 
uli may manifest a performance dec- 

rement to one of these stimuli, even 
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to the extent that the unconditioned 
stimulus (food) is rejected, yet will 
occur in full strength to any one of 
the other stimuli to which the re- 
sponse has been learned. Further, 
when the former conditioned stimu- 
lus precedes one which still results in 
the response, a performance decre- 
ment occurs to the latter, also. ‘ 

Fourth, there are two ways to rein- 
state a spontaneously extinguished 
conditioned response. If the response 
has been learned to a 30-second con- 
ditioned stimulus, it is possible to 
reduce the interstimulus interval to 
three or five seconds for several trials 
after which it is found that the re- 
sponse will follow the conditioned 
stimulus under the prior 30-second 
interstimulus interval. There are oc- 
casions, with highly trained animals, 
where all conditioned stimuli cease to 

ave an effect and the animals even 
reject food which follows the condi- 
tioned stimulus Presentation, That 
the animals are not satiated is indi- 
cated by the fact that they accept 
food presented in isolation from the 
conditioned stimulus, Under these 
circumstances a conditioned salivary 
response can be reinstated by em- 
Ploying entirely new conditioned 
stimuli: the response is quickly 
learned and stabilized to such new 
Conditioned stimuli. 

Finally, establishing a conditioned 
response to several reinforced stim- 
uli, Presented in order at 10-minute 
intervals, and then repeating one of 
those stimuli a number of times in a 
row while not employing the others 
may lead toa performance decrement 
to that stimulus (though it is still be- 
ing reinforced), but leave the re- 
Sponse to the other stimuli essentially 
unaffected. If this stimulus is re- 
peatedly reinforced at intertrial inter- 
vals of one-and-a-half minutes, @ 
response decrement will occur; but 
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presenting one of the other stimuli 
within that size interval will still re- 
sult in the occurrence of a conditioned 
response. Further, when the 10- 
minute intertrial interval is restored, 
the responses again come back in full 
strength. 

Several features of Pavlov’s data 
prevent one from making the quick 
generalization that repeated, massed 
reinforcements in and of themselves 
will result in performance decre- 
ments. First the manner in which 
the conditioned response spontane- 
ously extinguishes via increased la- 
tency, and the fact that conditioned 
responses do not occur when the con- 
ditioned stimulus is changed from a 
10- to a 30-second duration suggest 
that temporal relationships between 
the conditioned and unconditioned 
stimuli may have an effect. Itis well 
known, for instance, that large inter- 
stimulus intervals do not result in a 
well-established conditioned eyeblink 
response (e.g., Bernstein, 1934). 
There is the possibility that a re- 
sponse can be acquired to some de- 
gree of proficiency to a long duration 
conditioned stimulus, but that even- 
tually the organism responds to the 
duration in the form of increased la- 
tency, finally resulting in a complete 
loss of the response because of an 
inability to discriminate such long 
durations accurately enough. 

The facts that a conditioned re- 
sponse lost to one stimulus is unaf- 
fected in the presence of another 
stimulus and that a conditioned re- 
sponse lost to one stimulus can be 
trained to a new stimulus suggest 
more than the effects of continued 
reinforcement, massed or otherwise. 
The data indicate the possibility of 
an interaction between the reinforce- 
ment and a particular conditioned 
stimulus. In other words, if inhibi- 


tion builds up, thus prohibiting the 


occurrence of the conditioned re- 
sponse, the inhibition is specific to a 
particular stimulus and is not solely 
the result of repeated massed rein- 
forcements. It is to be noted that the 
conditioned stimuli employed by 
Pavlov were of varying effectiveness 
as far as degree of conditioning is 
concerned, and, further, that when 
performance decrements did occur 
depended to a certain extent upon 
the quality of the conditioned stim- 
ulus. A specific example of this 
(Pavlov, 1927, p. 235, 236) is when a 
response is established to a tactual 
stimulus within 179 trials. With a 
new conditioned stimulus (45-degree 
centigrade cutaneous stimulation) 
substituted, the conditioned response 
became established, but by the thirty- 
third trial the strength of the condi- 
tioned response had dropped to zero. 

With respect to the massing factor, 
Pavlov generally employed an inter- 
trial interval of 10 minutes or more 
between reinforcements. It is diffi- 
cult to conceive of this as massed re- 
inforcement. 

The one feature of Pavlov’s data 
which would appear to support the 
notion of performance decrements as 
a function of massed reinforcements 
is the result which obtains when the 
intertrial interval for an established 
conditioned response is changed from 
10 to 14 minutes. As pointed out 
above, this not only led to a perform- 
ance decrement, but also to a rejec- 
tion of food.? The occurrence of the 
response when another stimulus was 
introduced can be treated as an ex- 
ample of disinhibition. This experi- 
mental condition was not studied 


3 The fact that the organism could reject 
the unconditioned stimulus constitutes a de- 
parture from strict classical conditioning 


methodology, Just what this typis i We 
performance of highly trated Sa, 8 WER 
Paviov's, is not known, 
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very often, as, according to Pavlov, 
recovery of the organism from the 
aftereffects of reinforcements was 
rapid (within one-and-a-half min- 
utes) only for the particular dog 
yielding the results reported above. 
It is doubtful that the concept of per- 
formance decrement as a function of 
massed reinforcements should rest on 
the results of fewer than 30 massed 
trials on one animal; but, even in this 
animal, massing was not the sole 
factor operating as he rejected food 
on the last few trials with the original 
stimulus while accepting it when the 
disinhibiting stimulus was presented. 
In general, Pavlov’s data provide a 
number of instances of performance 
decrements, but the degree to which 
massed reinforcements can be con- 
sidered the major contributing factor 
is highly limited. 

At least three Sources (Bugelski, 
1956; Deese, 1952, 1958) cite 


a study 
by Hovland (1936) as evidence for 
performance decrement. Hovland 


administered 8 and 24 shock rein- 
forcements to two groups of Ss in a 
classical GSR conditioning experi- 
ment. No acquisition data are pre- 
sented, although when extinction 


“Its interesting to observe that the origi- 
nal definition of inhibition of reinforcement 
Was not a performance decrement durin 
massed reinforcements, Hovland (1936) states: 
are Contrary varieties of curves of 
experimental extinction have been described, 

he first type is characterized by a continu- 
ous decline |., of the response on successive 
unreinforced elicitations, . |, e second 
larger response on the 
inction trial than on the 
iti character- 


label 


i as do other 
authors (Bugelski, 1956; Deese, 1952, 1958; 
8) 
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followed the 24-trial acquisition series 
immediately there was an initial rise 
in the extinction curve. Itis this rise 
that Hovland attributes to the after- 
effects of inhibition of reinforcement. 
There are no data, however, in this 
study which indicate a performance 
decrement as herein defined. The 
only results given are extinction per- 
formances, where each curve is pre- 
sented in terms of the first extinction 
response used as a base from which to 
compute the percentages of subse- 
quent responses during the extinction 
trials. It would appear that, while 
Hovland did coin the phrase ‘‘inhibi- 
tion of reinforcement,” there is no 
evidence in his study for inhibition of 
reinforcement in its contemporary 
usage, contrary to the statements of 
the aforementioned authors. i 
Hilgard and Marquis (1940) cite 
Wendt (1930) as providing an ex- 
ample of performance decrement in 
classical conditioning, Wendt con- 
ducted an extensive investigation of 
the characteristics of the conditioned 
knee jerk, utilizing a blow to the right 
patellar tendon asa conditioned stim- 
ulus followed one-fifth of a second 
later by a blow to the left patellar 
tendon as an unconditioned stimulus. 
He discovered two kinds of response, 
distinguished primarily by latency. 
The first was a bilateral response with 
a latency of from 120 to 180 millisec- 
onds. This particular response in- 
creased very rapidly in frequency 
over successive trials, then decreased 
during the final training trials. It 
would appear that it is to this re- 
sponse that Hilgard and Marquis 
refer. The second response took more 
trials to develop, and occurred with a 
latency of 200 to 500 milliseconds. 
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response, and did zot decrease in fre- 
quency with repeated reinforcements. 
It is doubtful that the former re- 
sponse would be considered as a con- 
ditioned response by today’s stand- 
ards. Its latency and some of its 
trial-by-trial characteristics strongly 
resemble those of the “alpha” re- 
sponse (Grant, 1942, 1945; Prokasy 
& Truax, 1959), a reflex response to 
the conditioned stimulus in classical 
eyelid conditioning research, the fre- 
quency of which decreases with re- 
peated stimulus presentations. The 
second response discovered by Wendt, 
however, not only did not drop out, 
but also falls within a latency range 
often delineated for the classically 
conditioned response (see, e.g., Pro- 
kasy, Grant, & Myers, 1958). Again, 
the evidence for a performance decre- 
ment as a function of massed rein- 
forcements is slim. 

Wolfle’s now classic study of con- 
ditioned finger withdrawal (Wolfle, 
1932) is cited by Hilgard (1933) as 
further evidence of this unique kind 
of extinction. Wolfle employed 72 Ss 
spread over 13 interstimulus interval 
conditions. In four of these condi- 
tions the unconditioned stimulus pre- 
ceded the conditioned stimulus, in 
eight the conditioned stimulus pre- 
ceded the unconditioned stimulus 
and in the final condition the two 
stimuli were temporally coincident. 
Subjects were run in anywhere from 
two to nine experimental periods of 
40 minutes’ duration, the tests for 
conditioning being 10 irregularly 
spaced trials during the last 20 min- 
utes of the session. The final results 
were that, of 72 Ss, 25 did not 
condition, 18 indicated no constant 
change from session to session, 8 
showed a performance increment and 
20 a performance decrement. Com- 
bining individual S data within each 
of the eight forward conditioning 


groups indicates that under several 
experimental conditions an over-all 
numerical decrease in percentage of 
conditioned responses occurred. In 
none of these conditions could the 
drop be considered significant with 
the possible exception of the .3-sec- 
ond interstimulus interval group. 
However, the facts that, (a) over 
one-third of the Ss did not condition, 
(b) performance decrements were, at 
best, slight and, (c) the data were 
highly variable, all militate against 
accepting this study as serious evi- 
dence for a reliable performance dec- 
rement as a function of massed rein- 
forcements. 

Techniques of classical condition- 
ing that are relatively standard today 
were employed by Hilgard (1933) 
who investigated the conditioned 
eyeblink response to a conditioned 
stimulus of light and an uncondi- 
tioned stimulus of sound in two Ss. 
One of the Ss not only conditioned, 
but also manifested a decrement in 
amplitude of response after the 150th 
trial (50 trials a day for eight days) 
which was not regained. In addition 
to the fact that only one S gave this 
result (even though he constituted 
50% of the sample), there is the dis- 
tinct possibility that the uncondi- 
tioned response to the tone as an 
unconditioned stimulus simply adapt- 
ed out, the loss of the conditioned 
response being a function of the 
decreased motivational properties of 
the tone. It is already known that 
the reflex response to light adapts out 
(Grant, 1942, 1945; Prokasy & Truax, 
1959), and it is not unlikely to have 
occurred to the tone, also. Be that as 
it may, the present study is difficult 
to generalize because of a one-S limi- 
tation. 

The Newhall and Sears study 
(1933), cited as evidence for a per- 
formance decrement by Kantrow 
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(1937), is actually a psychophysical 
investigation comparing the results 
of a standard psychophysical tech- 
nique for establishing a visual thres- 
hold with those of a conditioned 
finger withdrawal technique using 
weak visual stimuli as the condi- 
tioned stimuli. There were seven Ss: 
one undergraduate and six profes- 
sional psychologists (including New- 
hall and Sears). First, visual thres- 
holds were obtained by the method of 
constant stimuli, then Ss were re- 
quired to report whether or not they 
saw a visual stimulus at the time they 
were being shocked following the 
visual stimuli. Interstimulus inter- 
vals were two seconds for three Ss, 
one second for two Ss and a half-sec- 
ond for the other two Ss. (Note, 
here, that the interstimulus interval 
was not conducive to good condition- 
ing for five of the seven Ss.) The 
three Ss for whom the performance 
decrement was observed were the 
three who did ‘not actually condition 
(made very few conditioned re- 
sponses) in the first place. Limited 
N, inappropriate interstimulus inter- 
vals and the near or below threshold 
level of the conditioned stimuli place 
serious limitations on the degree to 
which any of these data can be gen- 
eralized as evidence of a performance 
decrement. 

Kasatkin and Levikova (1935), 
mentioned as support of performance 


decrements during reinforced condi- 
tioning trials þ 


investigated con 
sponses in three 


lactating nipple ( 
infant’s mouth fo 
30 seconds. When the sucking (con- 
ditioned) responses were firmly es- 
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tablished, the conditioned stimulus 
was extended to a 10- to 15-second 
duration prior to the introduction of 
the unconditioned stimulus. The Ss 
received from 70 to 120 sessions over 
four or five months, with intertrial 
intervals of 10 to 60 seconds within 
each session. In two Ss the condi- 
tioned response dropped out after it 
had been established. Aside from the 
small N, two difficulties present them- 
selves. First, the duration of the 
conditioned stimulus was changed in 
the middle of the experiment, and, 
second, the longer duration inter- 
stimulus intervals are, as mentioned 
above, as a rule not compatible with 
a high level of conditioning. 

Kantrow (1937), cited by Hilgard 
and Marquis (1940), conducted a 
much more extensive study of condi- 
tioned sucking responses in the hu- 
man infant. There were four stages 
to each conditioning cycle. First 
there was a control period (intertrial 
interval) of 25 to 75 seconds, followed 
by a buzzer asa conditioned stimulus 
of five seconds’ duration. Then came 
the lactating nipple in conjunction 
with the buzzer for another 15 sec- 
onds, followed by the milk alone for 
from 15 to 120 seconds. This proce- 
dure was repeated for each of 16 in- 
fants (ages 44 to 117 days at the 
beginning of the experiment) during 
a single feeding until the infant either 
finished the bottle or refused to drink 
any more. These trials were admin- 
istered in place of the regular four- 
hour feeding schedule. Five of the 
infants were given more than 16 con- 
ditioning sessions, and each of these 
five emitted fewer sucking responses 
during the final several sessions. This 
decrease in sucking behavior during 
the five-second duration conditioned 
stimulus was correlated with wide 
fluctuations in gross bodily activity: 
While the data from this experiment 
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indicate a performance decrement, it 
is not at all clear just how hunger, as 
the motivation, operates in this situa- 
tion. By contemporary standards it 
is unusual classical conditioning meth- 
odology to infuse the unconditioned 
stimulus for such long periods of time 
and so to design the research that 
responding decreases to zero during 
the course of each training session. 
These two features of the present 
study should be explored more fully. 
Hull (1943) cites an unpublished 
study by Calvin in which, under 
massed reinforcement training, there 
is approximately a 2 to 4% drop in 
frequency of conditioned eyeblink 
responses from the 21-30 trial block 
to the 41-50 trial block. While Hull 
puts enough credence in this result, 
along with the results of Pavlov 
(1927), to aid in the development of 
his thirteenth corollary, it is doubtful 
that such a performance decrease is 
reliable. Furthermore, recent re- 
search into the distribution of prac- 
tice in eyeblink conditioning is not 
consistent with this finding (see e.g., 
Prokasy, Grant, & Myers, 1958). 
Razran (1955) reports a prelimi- 
nary study in which a group of Ss 
being tested on the first two extinc- 
tion trials after 36 reinforcements are 
superior in performance to a group of 
Ss being so tested after 72 reinforce- 
ments. This is put forward as evi- 
dence for ‘decrement through over- 
reinforcement,” or, in the present 
context, performance decrement. The 
particular response under investiga- 
tion was the magnitude of a human 
salivary response, classically condi- 
tioned to a 30-second duration series 
of flashing lights. The unconditioned 
stimulus in his experiment was @ 
small pretzel. Two Ss served in a 12- 
trial-a-day condition, and two served 
in a 24-trial-a-day condition. All Ss 
served on four successive days, the 


first three of which were devoted to 
either 12 or 24 conditioning trials, 
the last day consisting of 18 extinc- 
tion trials. It is to be noted that the 
test for amount of conditioning was 
made 24 hours after the last condi- 
tioning trials, and that this test indi- 
cated that the two Ss in the 36-trial 
group salivated significantly more 
(by a é test!) than did the two Ss in 
the 72-trial group. While no knowl- 
edge of the motivational properties of 
the 72nd pretzel are available, more 
recent evidence suggests that the 

decrement observed by Razran is 

very possibly not a “true” perform- 

ance decrement. Prokasy (1958) 

administered 20, 40 and 60 classical 

eyelid conditioning trials to three 

groups of Ss respectively, and ad- 

ministered extinction trials 24 hours 

later. Although the 60-trial group 

was superior in performance to either 

the 20-trial or 40-trial groups, the 

loss in performance over the 24-hour 

interval was far greater for the 60- 

trial group (60% drop vs. 40% drop 
for the 20-trial group). This greater 
performance drop (as measured in 
the first five extinction trials) of the 
60-trial group placed their perform- 
ance below that of the 20-trial group 
during extinction. As Razran pro- 
vides no data on the performance of 
the 72-trial group at the end of condi- 
tioning as compared to the 36-trial 
group, the critical comparison for a 
performance decrement is lacking; 
namely, performance itself being 
lower at one point in training than it 
was at some earlier point. 

Although the data from classical 
conditioning studies which support 
the notion of a performance decre- 
ment as a function of massed rein- 
forcements are severely limited, there 
is no lack of studies involving reason- 
ably extensive training providing 
negative results. One recent study, 
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in particular, stands out. Ross (1959) 
administered 200 massed (average 
intertrial interval of 15 seconds) 
training trials to 20 Ss in a classical 
eyelid conditioning experiment. There 
was no evidence of a performance 
decrement despite the fact that the 
number of massed trials at one ses- 
sion is greater than the number used 
by Razran (1955), Calvin (in Hull, 
1943) or Pavlov (1927). 


Instrumental Conditioning Studies 


The relevant research in instru- 
mental conditioning in support of 
performance decrements as a func- 
tion of massed reinforcements has 
more to recommend it than do the 
classical conditioning studies, Calvin, 
Clifford, Clifford, Bolden and Harvey 
(1956) report that 24 rats simply quit 
running down a straight 10-foot run- 
way toward a goal box despite con- 
tinued reinforcement. These investi- 
gators employed two levels of food 
deprivation (half of the animals re- 
ceived a total of 10 grams of food a 
day and the other half received a 
total of 12 grams of food a day) and 
two levels of distribution of practice 
(three minutes between trials for half 
of the animals and no delay between 
trials for the other half), with six ani- 
mals in each of the resulting four ex- 
perimental conditions. The criterion 
of extinction (during continued rein- 
forcements) Was two successive trials 
in which the animals refused to run 
within five minutes of being placed in 
the start box, Thirty trials a day 
were administered to all animals until 
this criterion was met. In brief, the 


criterion was met by all animals from 
5 to 94 days of training. These in- 
vestigators also found that the 10- 
gram group required a greater num- 
ber of trials to reach the criterion 
than did the 12-gram group, and that 


there was a significantly greater run- 
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ning time for the massed groups. No 
other differences were statistically 
significant. . 

There are some difficulties in both 
the design and the results of this 
study which should be mentioned. 
First, reinforcement was doled out by 
permitting each animal to eat from a 
food dish, containing the full daily 
ration, for a period of 10 seconds. At 
the end of each day's trials the re- 
maining food was placed in the cage 
with the animals. This type of feed- 
ing procedure is to be contrasted to 
the usual methods of maintaining 
body weight or a certain number of 
actual hours of food deprivation be- 
fore the training trials. 

Second, when an animal did not 
run within the five-minute criterion, 
he was placed in the goal box rather 
than removed from the apparatus. 
It is not unreasonable that this treat- 
ment had some effect on the runway 
behavior. 

Third, the authors cite their data 
as support for conditioned inhibition, 
yet do not discuss the theoretical 
implications of the fact that the 
massed and distributed groups of Ss 
did not differ Significantly in the 
number of trials required to reach 
criterion. 

Finally, and in contrast to the data 
in classical conditioning, no mention 
is made of gradually increasing run- 
ning times which would be expected 
from both the Hullian framework of 
conditioned inhibition as well as the 
usual conception of inhibition of rein- 
forcement in secondary sources (e.g 
Deese, 1952, 1958; Hilgard & Mar- 
quis, 1940). 

Keehn and Sabbagh (1958) ran two 
groups of rats in a conditioned avoid- 
ance situation. One group of animals 
(N=5) received 100 trials of light- 
shock sequence. Jf they made a 
treadmill-running response the shock 
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was cut off, and if the running re- 
sponse came prior to the onset of the 
shock, no shock was received. The 
other group of animals (W=85) also 
received 100 such trials, but they 
were administered 10 a day for 10 
days with a three-minute average 
intertrial interval within each day. 
It was found that the first group 
made more avoidance responses than 
did the second; and, although over- 
all conditioning was low, the second 
group manifested a decrease in avoid- 
ance behavior after the fifth session. 
In other words, the performance dec- 
rement came, not in the massed 
training group, but in the spaced 
training group. While this result is 
not easily interpreted, Keehn and 
Sabbagh suggest that the low condi- 
tioning level of the spaced group may 
be due to the infrequent trials pre- 
sented each day, that there were in- 
sufficient conditions for performance 
to have occurred and remained 
stable. Despite the fact that Calvin 
(1958) persists in calling this particu- 
lar result a case of conditioned in- 
hibition, neither the concept of con- 
ditioned inhibition nor that of inhibi- 
tion of reinforcement can handle the 
finding that the spaced group dete- 
riorated in performance while the 


massed group did not. 
A well-conceived study in this area 


has been reported by Kendrick 
(1958a). Ten rats were given 30 
trials a day in a 12-foot runway under 
20 hours of water deprivation with 
25 cc. of water reinforcement. Each 
S was given daily trials until the ex- 
tinction (performance decrement) cri- 
terion of refusal to run within five 
minutes on two consecutive trials 
during the first five daily trials for 
three consecutive days was met. 
Three hours after being run, Ss were 
given a half-hour ad libitum drinking 
period. All 10 Ss eventually stopped 


running, the over-all mean days to 
criterion for the 10 animals being 
33.2 days. A retest 36 days after ex- 
tinction yielded no evidence of spon- 
taneous recovery. As in the Calvin 
et al. (1956) study, the behavior ex- 
tinguished abruptly. A plot of mean 
running times over the six days prior 
to the three extinction criterion days 
indicates that the transition was not 
gradual as would be expected from a 
Hullian point of view (see Gleitman, 
Nachmias & Neisser, 1954). Log run- 
ning times changed from an over-all 
average of one the day prior to the 
extinction series to two and a half on 
the first of the three criterion days. 

Prior research (Seigel, 1947; Stellar 
& Hill, 1952) indicates that the total 
30-trial reinforcement of 7.5 cc. em- 
ployed in this study is only slightly 
greater than what a 24-hour deprived 
animal will drink in five minutes and 
far less than the 34 cc. ad libitum 
daily drinking found by Stellar and 
Hill with nondeprived animals. Thus, 
low drive level does not appear to be 
a factor in the findings. While Keehn 
(1959) suggests that unreinforced 
licks of the watchglass which pro- 
vided water may be responsible for 
the results, this is highly speculative 
in light of the fact that the animals 
were water deprived, and extin- 
guished within the first five trials on 
the criterion days. Kendrick’s un- 
usual findings differ from the usual 
form of extinction in that cessation 
was abrupt and there was no spon- 
taneous recovery. 

In arecent note, Kendrick (1958b) 
informally reports that employing a 
ł cc. water reinforcement for travers- 
ing a 10-foot runway and pressing a 
lever requiring 60 grams of pressure 
leads to performance decrements un- 
der massed training conditions. He 
states that the running response €x- 
tinguishes, but that the bar pressing 
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itself does not extinguish. No data 
are presented, hence no assessment 
of this report can be made. 

As in the case of classical condition- 
ing, there is no lack of evidence for 
persistent responding over extensive 
spans of time when motivation is 
properly controlled. The reams of 
raw data presented by Ferster and 
Skinner (1957) are clearly adequate 
testimonial of perseverative behavior 


on the part of at least two animal 
species. 


CONCLUDING COMMENTS 


The hypothesis examined in this 
paper is essentially that massed rein- 
forcements will result in a postas- 
ymptotic decrement in performance 
of a learned behavior. In classical 
conditioning there are good examples 
of performance decrements, primar- 
ily from Pavlov (1927), but these 
decrements in general are not a func- 
tion of massed trials. Other factors, 
such as an interaction between par- 
ticular conditioned stimuli and the 
conditioned and unconditioned re- 
sponses, seem to be operating. The 
two instances of a performance dec- 
rement related to massing are both 
based upon one S$ (Pavlov 1927; 
Hilgard, 1933), and are, therefore, 
severely limited in generality. 


In neither 
is there a control 
permits the deduc- 
rements are a func- 


s In the first 
study (Calvin et al., 1956) there were 


stributed groups 
e groups did not 
of trials to reach 


tributed trials control group. Thus, 
d running, 
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with no manipulated independent 
variable correlated with this result, 
there is no way to make any decision 
concerning which factors in the ex- 
perimental situation would consis- 
tently produce such a decrement. } 
In general, then, the hypothesis 
examined cannot be accepted: to date 
there is insufficient evidence to con- 
clude that massed reinforcements in 
and of themselves will produce a 
performance decrement in the pres- 
ence of repeated reinforcement. _The 
present studies do, however, indicate 
several avenues of research available 
in attempting to isolate the variables 
correlated with the obtained per- 
formance decrements. In classical 
conditioning, multiple conditioned 
stimuli ought to be used in an attempt 
to determine why Pavloy’s data in- 
dicate that some stimuli interact with 
the particular response system to pro- 
vide performance decrements, An- 
other complicating possibility are the 
long-term effects of particular condi- 
tioned stimuli. If there are short- 
term inhibition effects, Pavlov’s data 
Suggest, in addition, a second inhibi- 
tion component related to highly 
overtrained Ss that spontaneously ex- 
tinguish after years of performance. 
Very little is known, furthermore, 
about the characteristics of the un- 
conditioned response in classical con- 
ditioning. Any change in the uncon- 
ditioned response, such as adaptation 
to the unconditioned stimulus of 
changes in the recovery time from the 
unconditioned response, could well 


influence conditioned response per- 
formance. 


In instrumental conditioning, situ- j 
to that of Kendrick 


ations similar 
(1958a) ought to be more fully ex- 
plored with variations in intertrial 
interval, length of runway, an 
amount of reinforcement, It remains 
to be seen, also, whether or not the 
sudden performance decrements ob- 
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served in the straight runway will oc- 
cur in other instrumental response 
situations. If any general concept of 


a performance decrement is to be de- 
veloped, clearly it must not be 
peculiar to the straight runway. 
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ERRATUM 


In the article ‘Partial Reinforce- 
ment: A Selective Review of the Lit- 
erature Since 1950” by Donald J 
Lewis (Psychol. Bull., 1960, 57, 1- 


28), the figure in the sixth line of page 
19 should read 15%. 
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Until fairly recently interest in the 
ontogenetic development of percep- 
tion on the part of English-speaking 
psychologists has been sporadic at 
best. Work on this problem has been 
generally ignored in the main litera- 
ture in the field of perception, as in 
that of child development, although 
Vernon’s (1954) survey of percep- 
tion and Werner's (1957) compara- 
tive psychology treatise represent 
notable exceptions in this regard. 

There are, however, signs of a defi- 
nite reawakening of interest in the 
problem of perceptual development 
of late, paralleling perhaps the sim- 
ilar revival in the area of perceptual 
learning (e.g., Gibson & Gibson, 
1955), Thus an increasing number of 
individual studies, as well as several 
more extended programmatic re- 
search projects, such as those by 
Edgren (1953), Gollin (1956, 1959a, 
1959b) and Wapner and Werner 
(1957), are being reported which ex- 
plore a variety of perceptual prob- 
lems at different developmental 
levels. Similarly, a recent textbook 
on child development (Baldwin, 
1955) has turned to the facts of per- 
ceptual and cognitive development 
for its theoretical framework, in 


1 The survey of the literature reported in 
this article was carried out in part while the 
author was a postdoctoral National Science 
Fellow at the University of Geneva, Switzer- 
land. The critical comments of Seymour 
Wapner and Morton Wiener are gratefully 
acknowledged. 


sharp contrast to the consistent 
neglect of this topic in prior text- 
books in this field. 

This would seem, then, to be an` 
auspicious time for a review of the 
extensive literature on this topic, 
much of which, being of foreign ori- 
gin, is generally unfamiliar to Amer- 
ican psychologists. 

The present review is limited to ex- 
perimental studies involving com- 
parisons among two or more age 
groups. In order further to delimit as 
sharply as possible the area covered 
by this review, studies dealing on 
the one hand with tasks of a pri- 
marily cognitive, conceptual or psy- 
chomotor nature, and on the other 
hand with responses known to be 
direct functions of receptor mecha- 
nisms (e.g., temperature sensitivity, 
dark adaptation, etc.) were excluded. 

Following a brief consideration of 
methodological problems, studies will 
be reviewed under the following head- 
ings: sensory thresholds, illusions, 
orientation and localization, the con- 
stancies, depth, form, number, move- 
ment, time, and perceptual learning. 
A concluding section will be devoted 
to a review and discussion of the 
major developmental trends uncov- 
ered in this review. 


METHODOLOGICAL CONSIDERATIONS © 


A variety of methodological prob- 
lems faces the investigator in this 
particular field. Since some of them 
have rarely been spelled out and are 
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frequently ignored in the design and 
execution of experiments, they are 
discussed here briefly.? 
First of all, several problems arise 
with regard to the sampling of Ss 
in cross-sectional comparisons among 
different age groups. Apart from the 
obvious requirement of comparabil- 
ity among the samples representing 
the various age levels, the choice of 
the particular age groups to be in- 
cluded in an investigation deserves 
more careful attention than it is gen- 
erally given. It is patently not valid 
to extrapolate trends obtained from 
a few selected age groups to the 
whole course of development, since 
age trends in this area are frequently 
discontinuous, nonlinear and even 
U shaped. Ideally, then, the investi- 
gation should include enough age 
groups to allow a determination of 
the total developmental trend over 
the age span under consideration. 
Secondly, several points may be 
brought out with respect to the 
statistical analysis of developmental 
data. The ordinary analysis of vari- 
ance is altogether insensitive to the 
order among the groups which are 
compared; thus it is relatively lack- 
ing in power when applied to the 
means of a number of groups repre- 
senting points on an ordered variable 
By the same token the 
nificant age group ef- 
t itself convey any in- 
arding the form or even 
of a consistent age 
trend. A decided improvement in 
this situation is the technique of 
trend analysis, Particularly as de- 


appear 
forthcoming handbook on 


psychology, The writer is in 
Gibson for a prepublicati 
chapter. 
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veloped by Grant (1956) to handle 
nonlinear regressions. In applying 
this or similar parametric techniques, 
one needs of course to be alert to 
heterogeneity of variance in the data, 
since variability between Ss fre- 
quently decreases with age, accom- 
panying an increase in accuracy 
(intrasubject variability), Under cer- 
tain circumstances, furthermore, 
even an increase in variability with 
age may occur (e.g., Klimpfinger, 
1933). 

Finally, in the majority of the 
studies that rely on one or another 
of the psychophysical methods, there 
are various factors basic to the judg- 
ment being made which the develop- 
mental investigator needs especially 
to keep in mind, since the contribu- 
tion of such factors may not be con- 
stant across different age levels; thus 
they may in themselves account for, 
or at least seriously affect, the age 
changes observed in a given situa- 
tion. For instance, the methods of 
limits and adjustment typically give 
rise to starting-position effects which 
have been shown to change in 
amount and even direction with age 
(Piaget & Lambercier, 1951b; Wap- 
ner & Werner, 1957), Consequently, 
even if the results from increasing 
and decreasing series are averaged, 
the PSE's thus obtained will be sub- 
ject to varying amounts of error at 
the different age levels. This consid- 
eration suggests that in develop- 
mental comparisons of threshold 
values in particular these methods 
are of doubtful usefulness, 

A different constant error, com- 
mon to all methods in which a stand- 
ard is compared to a series of vari- 
ables, is the So-called “error of the 
Standard” (Piaget & Lambercier, 
1943a): the stimulus serving as the 
standard tends to be overestimated. 
This effect appears with particular 
Prominence in constancy experiments 
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(Akishige, 1937; Piaget & Lamber- 
cier, 1943b; Piaget & Lambercier, 
1956a), but has been noted also in 
various other situations (Piaget & 
Lambercier, 1943a, 1953, 1956b; Riv- 
ers, 1905). Furthermore, a study by 
Tampieri (1955) indicates that the 
direction of the error of the standard 
may change from over- to underesti- 
mation, depending on the absolute 
size of the standard, which suggests 
an interpretation in terms of adapta- 
tion-level theory (Helson, 1948). If, 
as argued by Piaget and Lambercier 
(1943a), this error decreases with 
age—and a certain amount of evi- 
dence supports this assertion—it is 
clearly imperative to control for this 
effect when making comparisons 
among age groups. 

A further constant error may arise 
when the S has to match a standard 
to one of a set of variables presented 
simultaneously in an ordered series. 
According to Lambercier (1946a), 
a strong central-tendency effect en- 
ters into such judgments, systemati- 
cally shifting the choice of the vari- 
able in the direction of the middle of 
the series. This effect—which may 
be related to the error of the stand- 
ard, as interpreted by Tampieri 
(1955)—again appears to decrease 
with age. 

In conclusion, it is likely that a 
very large proportion of the research 
to be reported here is open to criti- 
cism on one or another of the above- 
mentioned grounds. This applies in 
particular to the extended series of 
studies by Piaget and his collabo- 
rators, which bulks rather large in 
our review, and whose shortcomings 
should thus be noted in brief. First 
of all, most of these studies employ a 
fairly unorthodox psychophysical 
method devised by Piaget and Lam- 
bercier (1943a) for use with children, 
in which the experimenter attempts 
to “zero in” on the PSE by alternat- 


ing between variables on either side 
of it. Although this ‘clinical con- 
centric method” permits a relatively 
rapid determination of an S’s PSE, in 
actual application it appears to be 
far from systematic and relatively 
lacking in precision. 

A second point concerning Piaget's 
work is the general neglect of mat- 
ters of experimental design (e.g., 
order of presentation of different ex- 
perimental conditions) and of tests 
of significance. Closer attention to 
these points might perhaps have 
checked Piaget’s general tendency 
to overinterpret his results and to 
engage in lengthy ad-hoc explana- 
tions for findings of frequently dubi- 
ous significance—a tendency not 
calculated to enhance the readability 
of his reports (for example, cf. 
Piaget and Lambercier, [1953]). In- 
deed, this weakness tends to detract 
from the appreciation of the very 
real contribution which Piaget’s work 
represents, in its investigation of a 
large variety of significant problems 
and in the ingenuity and thorough- 
ness displayed in the manipulation 
of pertinent stimulus variables. 

If in examining these studies, then, 
as well as those from other sources 
which are frequently no less subject 
to methodological criticism, a rather 
heavy strain may seem to be placed 
on the reader's willingness to ‘‘as- 
sume an attitude towards the mere 
possible’’—as Goldstein and Scheerer 
(1941, p. 4) put it—this appears 
nevertheless to be a price well worth 
paying, in view of the interest of the 
problems and the theoretical issues 
which they will be found to raise. 


REVIEW OF EXPERIMENTAL STUDIES 
Sensory Thresholds 


The evidence regarding age 
changes in absolute and differential 
sensitivity for various physical di- 
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mensions will be reviewed rather 
briefly, relying in the main on Peters’ 
(1927) review of the literature on 
this topic. (Except for some studies 
on absolute sensitivity in infancy, 
reviewed by Munn [1955, ch. 8], not 
much relevant work appears to have 
been carried out more recently.) 
Peters cites certain functions for 
which sensitivity decreases with age, 
such as the upper and lower absolute 
thresholds of pitch, as well as the 
two-point threshold of skin sensi- 
tivity. Most differential thresholds, 
on the other hand, steadily decrease 
with age, i.e., discrimination im- 
proves. Particularly marked changes 
in this direction have been found for 
pitch, weight and saturation, as well 
as for the recognition of hue (the 
threshold being measured here in 
terms of the degree of saturation re- 
quired). Visual acuity likewise im- 
proves with age, although the two 
studies cited by Peters differ with 
respect to the amount of improve- 
ment found. 


stantial cha 
tivity for si 
been shown 
at the sam 


e age of about four 
or five years, however, little further 


change appears to take place in the 
ability to discriminate size or length. 
Thus, Giering (1905) found no dif- 
ference between 6- and 14-year-old 
children in their ability to reproduce 
a distance between two points (the 
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performance of children below the 
age of six is inferior, but this is quite 
likely due to the cognitive difficulty 
of the task). Estes (1959) likewise 
reports that the accuracy of size 
judgments for triangles, circles, and 
squares does not change in the age 
range from kindergarten to college 
students. Volkelt (1926) also ob- 
tained very closely comparable 
thresholds in his three age groups 
(three- to six-year-olds, an unspceci- 
fied school-age group, and adults) for 
comparisons of three-dimensional ob- 
jects, i.e., spheres. For lines and cir- 
cles the youngest group did yield 
substantially higher thresholds than 
the other two groups (at all age levels 
the width of the threshold was least 
for spheres and greatest for lines). 
These results for visual size judg- 
ments thus generally go counter 
to the substantial developmental 
changes in differential sensitivity for 
the dimensions of Pitch, weight, hue, 
and saturation noted previously. 
This discrepancy may reflect the 
greater complexity of the proximal 
stimulus in the case of the latter di- 
mensions, and the Opportunity thus 
created for Progressively finer dif- 
ferentiation of the relevant informa- 
tion in the stimulus, as argued by 
Gibson and Gibson (1955) in their 
discussion of perceptual learning. 


Illusions 


Due in part to the persistent in- 
terest shown by Piaget and his col- 
laborators in the investigation of a 
large variety of illusions, this topic 
accounts for a considerable and per- 
haps disproportionate share of the 
literature on the development of per- 
ception. As we shall see, however, 
much of this work touches on prob- 
lems of more general significance to 
perceptual development. In order to 
facilitate comparisons among studies 
on the same Phenomenon, the results 
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SUMMARY OF DEVELOPMENTAL STUDIES OF SELECTED ILLUSIONS 


Illusion and Reference _| Age Groups Included! Age Trend Observed Remarks 
Delboeuf, (Fig. 1a) 
Giering, 1905 6 and 14 Ilusion less at 14 


Piaget, Lambercier et al. 


(1942) 
Rüssel, 1934 


5 to 12 & adults 
4 to 6} 


(contrast shown main- 
ly at 5-7 years) 
Small, irregular changes 


Errors studied as function of 
size of circles (see text), 


Parallel Lines (Figs. 1b, 1c) 
Giering, 1905 
Piaget & v. Albertini, 
1950 


6 and 14 
5 to 9 & adults 


No difference 
Decrease, in general 


Only four values of variable used. 
Errors studied as function of 
dimensions of figure (see text), 


Titchener Circles (Fig. 1d) 
Rüssel, 1934 
Wapner & Werner, 1957 


4 to 6} & adults 
6 to adults 


Small decrease from 4 to 6} 
Increase (irregular) 


Variables displayed in ordered 
series on chart, 


Size-Weight 
Ciampi (see Rey, 1930) 


Ohwaki, 1953 
Rey, 1930 


3-6; 7-15; adults; 
aments; dements 


3; 4; 5; aments 
5 to adults; aments 


Increase in % of Ss showing 
illusion 


Increase (as in Ciampi) 
Increase to 9, then decrease 


No_ psychophysical measures. 
For percentage of Ss, aments 
=normals; dements lowest. 

Aments =normals of same M.A. 

Adults not “naive.” Illusion 
lowest in aments. 


Miiller-Lyer 
Binet, 1895 
Noelting, in press 
Piaget & v. Albertini, 
1950 
Piaget, Maire & Privat, 
1954 


Pintner & Anderson, 1916 


Van Biervliet, 1896 
Walters, 1942 
Wapner & Werner, 1957 


7-12 and 10-14 
5 to 10 & adults 


5 to 9 & adults 
4 to 10 & adults 


6 to adults; aments 


12-16 & adults 
6 to 19 
6 to 19 


Less in older group 
Decrease 


Illusion smallest for adults 


Consistent decrease 
Decrease 


Less for adults 
Decrease to 10; rise 15-19 
Decrease to 12; rise 15-19 


Practice effects studied (see un- 
der perceptual learning) 

Errors studied as function of di- 
mensions of arrows (see text) 

Normal illusion compared to 
that in Fig. 3 (see under 
form). Yo 

Strong anticipation effects, 
Adult aments =11-year nor- 
mals. n 

Figure presented vertically; an- 
gles of arrows varied. 

Illusion increased on retest of 
6-9 group 3 years later. 


Norizontal-Vertical 
Fraisse & Vautrey, 1956 
Rivers, 1905 
Walters, 1942 
Winch, 1907 
Wiirsten, 1947 


6; 9; adults 


Children & adults 
6 to 19 

8 to 15; adults 

5 to 13; adults 


Varies with type of figure and 
exposure time (see text) 


Less for adults 
Decrease 

Slight decrease 

Rise to 10, then decrease 


Some differences between adults 
of differing educational back- 
grounds. * 

Data show overestimation of 
standard (whether V or 

Trend confirmed in retest of 6- 
9 group 3 years later. 

Based on drawings (V to be 
equaled to H), 

Various figures used; lines most- 
ly nonintersecting, 


effects. 


The work on the Delboeuf 


for the more frequently studied illu- 
sions are summarized in Table 1; 
some of these, involving assimilation 
and contrast phenomena, are repro- 


duced in Fig. 1.8 
Spatial ‘assimilation and contrast 


3 A study by Hartmann and Triche (1933) 
on age changes for a number of familiar optical 
illusions has been omitted from consideration 
in this review, since the authors limited them- 
selves to the determination of the proportion 
of subjects susceptible to the various illusions, 

iven the compelling nature of most of these, 
the almost wholly negative results obtained 
would seem to be of questionable value. 


and parallel-line illusions indicates 
that effects of assimilation to con- 
textual stimuli decrease with age (cf. 
Table 1). This is shown clearly in 
the study by Piaget, Lambercier, 
Boesch and von Albertini (1942) on 
the Delboeuf illusion (Fig. 1a), in 
which the relative size of the circles 
was systematically varied, yielding 
an S shaped function, as illustrated 
in Fig. 2. (The curves in Fig. 2 ac- 
tually represent an idealized com- 
posite of five separate sets of curves 
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Name of 


illusion Stondard Variable 


a: concentric O 
circles 
(Delboeuf) 
b: parallel 
lines 
c: parallel 
lines 
iTi O 
s Ro 25°, 
OO %50 


Fic, 1. Illusions of assimilation and con- 


trast, as investigated in developmental studies 
cited in the text. 


obtained for different absolute values 
of the stimuli. In particular, in three 
of the five cases the separate age 
curves differ substantially in the 
points at which they cross the X 
axis.) It will be noted that beyond a 
certain point further increase in the 
size of the context circle results in a 
contrast phase; however, only for 
the younger children does this phase 
appear consistently and prominently. 
Very similar results were found b 
Piaget and von Albertini (1950) for 
the parallel-line illusion (Figs. 1b 
and ic). Again, as the relative 
lengths of the inner and outer lines 
were varied, assimilation first in. 
creased, then decreased, changing 
eventually to contrast, though mainly 
for the children. Furthermore, varia- 
tion in the vertical distance between 
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the middle and the outer lines re- 
sulted in a change from a slightly 
negative to a strongly positive error 
(i.e., overestimation of the middle 
line for both figures). This latter re- 
sult, indicating that this variable op- 
erated at the same time to increase 
assimilation (in Fig. 1b) and contrast 
(in Fig. 1c), together with the finding, 
corroborated in the Delboeuf study, 
that both assimilation and contrast 
effects decrease with age, suggest the 
possibility that both of these effects 
may be reducible to a single mech- 
anism. 

Piaget, in fact, has proposed such 
a mechanism, which he terms “cen- 
tration,” and which he has applied 
to a variety of illusions and related 
phenomena in perceptual judgment. 
This mechanism produces systematic 
distortions in perception, due to over- 
estimation of the fixated elements of 
the stimulus field. In the course of 
development, however, these distor- 
tions are increasingly offset by ‘‘de- 
centrations,” resulting from more 
mobile and effective exploration of 
the total figure (cf. Piaget, Lam- 


bercier, Boesch & von Albertini, 
1942), 


anais children 


adults 


Error in perceived size of inner circle 


Fic. 2. Strength of concentric-circle illusion 
as a function of the relative width of the ring 
(after Piaget, Lambercier, Boesch & vo? 
Albertini, 1942) 
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Through further elaboration of 
these constructs Piaget has indeed 
arrived at a probabilistic model for 
perceptual judgment (Piaget, 1955) 
which he has successfully applied to 
a variety of illusions, all of which 
feature error curves of the type 
shown in Fig. 2 as a function of spe- 
cific stimulus dimensions (e.g., Piaget 
& Péne, 1955; Piaget & Vurpillot, 
1956). For our purposes, however, 
the main point is that according to 
Piaget’s theory the shape of the error 
function remains constant with age, 
the extent of the illusion decreasing 
over all portions of the function. 
Thus both assimilation and contrast 
should decrease with age. Yet Wap- 
ner and Werner (1957), in their study 
on the Titchener circles (Fig. 1d)— 
an illusion at least superficially sim- 
ilar to those which we have been con- 
sidering—found a contrast effect 
which increased with age. To what 
extent this discrepancy may be at- 
tributable to possible central-tend- 
ency effects arising from the exposure 
of the variables in an ordered series 
(cf. methodology section) and to 
what extent it reflects a more basic 
difference in the processes operating 
in this illusion is difficult to decide 
at this point. Systematic variation 
of the stimulus dimensions, as in the 
above-mentioned studies by Piaget 
and his collaborators, should lead to 
clarification of this question. 

Temporal contrast effects. A study 
by Ikeda and Obonai (described in 
Sagara & Oyama, 1957) conclusively 
demonstrates that the Delboeuf illu- 
sion (Fig. 1a) can be transformed 
from an assimilation to a contrast 
phenomenon by changing the tem- 
poral relationship between the ex- 
posures of the outer and inner circles 
from simultaneity to succession. De- 
velopmentally there apppears to be a 
similar opposition between the simul- 
taneous and successive conditions. 


This is i indicated in Piaget and 
Lambercier’s (1944) study of the 
Usnadze illusion, which is essentially 
equivalent to the Delboeuf illusion 
under successive presentation: two 
circles differing substantially in size 
are repeatedly exposed in a tachisto- 
scope, following which a standard is 
shown within the area formerly occu- 
pied by the larger circle. A contrast 
effect develops (i.e., the standard is 
underestimated) which is greater in 
adults than in five- to seven-year-old 
children, in the sense that the illusion 
grows at a faster rate for adults as the 
number of presentations of the induc- 
ing stimulus increases. At the same 
time, however, the effect dissipates 
more rapidly for the adults after the 
final exposure of the inducing stimu- 
lus. It is of interest to note the cor- 
respondence of these findings with 
those of a recent study of figural af- 
tereffects in normal and mentally de- 
ficient adolescents by Spitz and 
Blackman (1959), who found that 
their normals satiated more quickly, 
but that the effects also dissipated 
faster, in comparison with the men- 
tally deficient group. 

Piaget and Lambercier’s explana- 
tion of the increase in contrast with 
age, in terms of adults’ greater sus- 
ceptibility to anticipation or set, re- 
ceives some support from the similar 
developmental trend found for the 
size-weight illusion, which has usually 
been attributed to the contrast be- 
tween the perceived weight of an ob- 
ject and that anticipated in view of 
its size. As Table 1 shows, suscepti- 
bility to this illusion increases mark- 
edly with age; it is also smaller in 
mental defectives as compared to 
normals of the same chronological 
age. 

The Miiller-Lyer illusion. This 
well-known illusion has proved very 
popular in developmental investiga- 
tions, as seen in Table 1, which 
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shows, moreover, substantial agree- 
ment among the several studies as 
regards the decrease of this illusion 
with age. This decrease has gener- 
ally been interpreted in terms of the 
younger child’s difficulty in isolat- 
ing the parts from the whole—a 
problem which will be more thor- 
oughly discussed in the section on 
form perception. One empirical 
question which remains to be settled 
concerns the pattern of changes in 
this illusion through the period of 
adolescence. The studies by Walters 
(1942) and by Wapner and Werner 
(1957) both indicate a rise in the 
Strength of the effect in later adoles- 
cence and early adulthood, whereas 
in the studies from the Geneva lab- 
oratories (Noelting, in press; Piaget, 
Maire, & Privat, 1954; Piaget & von 
Albertini, 1950) the adult groups 
yield the smallest average illusion. 
(None of the Geneva studies, how- 
ever, included Ss in the adolescent 
range.) 

The investigation by Piaget and 
von Albertini (1950) is of special 
interest for the comparison made be- 
tween the Miiller-Lyer and the par- 
allel-line illusion, Pointing to the 
formal similarity between the two 
types of figures (the latter may be 
derived from the former by eliminat- 
ing the arrows and connecting their 
end-points), the authors have studied 
the changes in the Miiller-Lyer as a 


function of the same stim 
ables mani 


figures. 
those due to a 


parallel-line illusion, even though in 
absolute terms the effects for the 
Miiller-Lyer are much stronger. Thus 
the inference of common Processes 
operating in the two situations ap- 
pears justified. 
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The horizontal-vertical illusion. De- 
spite a considerable amount of de- 
velopmental work on this illusion 
(cf. Table 1), the nature of its 
change with age remains uncertain. 
In view of the differences among the 
various studies in the types of stim- 
ulus figures used, as well as in the 
methodology employed, and in view 
also of the failure to control system- 
atically contaminating factors, such 
as the role of the bisection of the 
horizontal in the inverted T, the true 
meaning of the conflicting results 
obtained is difficult to assess. a 

The findings from the two major 
studies, by Walters (1942) and by 
Wiirsten (1947), nevertheless suggest 
the following tentative conclusion: 
if the two lines of the figure intersect 
(as they presumably did in Walters 
study), the effect decreases with ages 
if, on the other hand, the S has to 
compare noncontiguous segments, 
as in almost all of Würsten’s stimuli, 
the illusion increases up t i 
mum at about a 
declining up to 
which emerged 


o a maxi- 
ge 10, subsequently 
adulthood—a trend 


] very regularly and 
consistently from Wiirsten’s data. 


In partial Support of this statement 
one may cite the study by Fraisse 
and Vautrey (1956), who found an 
increase between 6 and 10 years for 4 
figure made up of two spatially sep- 
arated segments, but found no ap- 
preciable age trend for the inverted- 
T figure. (The two figures also dif- 
fered markedly in terms of the effects 
of short vs. long exposure times.) 


What might account for this dis- | 


crepancy between these two types ° 
figures? A clue may perhaps be found 
in Piaget’s interpretation of Wit’ 
sten’s findings as reflecting the grad- 
ual development of a stable system ° 
spatial coordinates jn the age rang? 
from 6 to 10, resulting in an enhanc 
ment of the illusion (cf. Piaget 
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Morf, 1956). But this factor would 
probably operate only if the two 
lines to be compared are not spa- 
tially contiguous, so that their spa- 
tial relationship is not clearly defined 
for the S. If the lines intersect, on 
the other hand, they provide in ef- 
fect their own spatial framework, al- 
lowing for a more immediate opera- 
tion of the illusion. The fact that the 
illusion is considerably stronger in the 
latter case (Fraisse & Vautrey, 1956) 
would support this argument. 

Other illusions of orientation. If, 
as we shall demonstrate in the sec- 
tion on form perception, young chil- 
dren are relatively insensitive to the 
orientation of stimuli in space, one 
might expect them to be less suscep- 
tible than older Ss to a variety of 
other illusions which depend on this 
factor. A’case in point is the Schu- 
mann-square illusion, involving an 
overestimation of a square tilted at a 
45-degree angle (i.e., a diamond), in 
comparison with a square of equal 
size oriented along the spatial hor- 
izontal and vertical. Hanfmann 
(1933) did in fact find an increase be- 
tween the ages of three and six years 
in the percentage of Ss responding to 
this illusion (after this age there was 
a decrease, attributed by the author 
to negative suggestion on the part 
of the older children). An attempt to 
measure the strength of this illusion 
by a psychophysical method did not 
result in any age differences, but the 
series of variables was apparently too 
coarse to yield very precise data. 

In conclusion, several related 
studies by Piaget and his collabora- 
tors may be mentioned, which gen- 
erally show an increase with age. 
Thus, in the comparison of two hori- 
zontal lines, staggered so as to form 
two sides of a parallelogram, the top 
line is overestimated increasingly be- 
tween 5 and 8 years (Piaget & 
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Taponier, 1956), while Piaget and 
Morf (1956) find a similar age trend 
for the comparison between two sim- 
ilarly staggered vertical lines, as well 
as in the comparison of two oblique 
segments lying on a straight line. 
These various age trends are inter- 
preted in terms of the increasing role 
of the spatial framework in the per- 
ception of the relationship between 
the lines. Accordingly, this age trend 
should not apply when two vertical 
lines are directly superimposed, since 
here the framework could not play 
an important role; indeed, Piaget 
and Morf (1956) find that in this 
situation the overestimation of the 
top line decreases with age. As the 
separation between the lines is in- 
creased, however, the overestima- 
tion becomes attenuated and the age 
differences correspondingly less con- 
sistent (Piaget & Lambercier, 1956b). 
Miscellaneous illusions. A few illu- 
sions which have received only inci- 
dental attention will be mentioned 
briefly. Decreases with age have 
been found for the Sander parallelo- 


4 This study represents one of the very few 
instances (cf. also Piaget & Lambercier, 
1946) in which Piaget has concerned himself 
with the relationship between perception and 
thinking in the context of an experimental 
study. In the present article he points to the 
contrast between the increase in error with 
age in the perceptual judgments and the 
marked decrease found over the same age 
range in the incidence of a purely cognitive 
error—lack of conservation of length—arising 
under identical stimulus conditions, when the 
actual equality of the lengths to be compared 
has been demonstrated to S beforehand. 

‘ It might be pointed out that Piaget appears 
in general to be more impressed by the differ- 
ences than by the similarities between these 
two types of functions and their development; 
accordingly his work on perception is almost 
otally divorced from his more widely known 
work on conceptual development. He has, 
however, outlined the interrelationship be- 
tween the two in more formal, theoretical 
terms (Piaget, 1950; Piaget & Morf, 1958). 
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gram (Heiss, 1930), the Poggendorff 
illusion (Vurpillot, 1957), the Oppel- 
Kundt segmented-line illusion (Pia- 
get & Osterrieth, 1953), as well as 
for constant errors in the judgment 
of angles and of the lengths of their 
sides (Piaget, 1949; Piaget & Péne, 
1955), in the judgment of chords of a 
circle (Piaget & Vurpillot, 1956) and 
in the comparison of the widths of 
rectangles varying in length (Piaget 
& Denis-Prinzhorn, 1953; cf. also 
Seashore & Williams, 1900); In- 
creases with age appear in the case of 
the Jastrow illusion, which Giering 
(1905) found to be stronger at 14 
than at 6 years, and for the Ponzo 
illusion (overestimation of a line 
which intersects a bundle of lines 
converging on a near-by point), for 
which Leibowitz and Heisel (1958) 
report an increase between four and 
Seven years, though no further 
changes beyond that age. Finally, 
for an illusion of inclination, involy. 
ing two staggered rows of rectangles 
separated by a horizontal line, which 
is perceived as tilted, the results of 
Piaget and Denis-Prinzhorn (1954) 
indicate in general an increase up to 
age 10, followed by a decrease, possi- 
bly reflecting Processes similar to 
those Operating in Wiirsten’s (1947) 


study of the horizontal-vertical illu- 
sion. This effe 


influenced by 
such as the 
around the fi 
or solid rect 


illusions, we should n 
of particular interest m 
Vurpillot (1957) in he: 
Poggendorff illusion, 
meaningfulness of the figure. When 
this author altered the figure so as to 
suggest a concrete scene, while leay- 
ing the basic stimulus elements un- 
changed, a very significant decrease 


nanipulated by 
r study of the 
that of the 
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in the illusion resulted for the 9-year, 
12-year, and adult groups, whereas 
the two youngest groups (5 and 
7 years) failed to profit from this 
alteration. While it would be dan- 
gerous to extrapolate this finding to 
the role of meaningfulness in gen- 
eral, it would seem to weaken the oft- 
heard argument that the poorer per- 
formance of young children in per- 
ceptual tasks is due to the use of ab- 
stract, meaningless stimuli. 


Orientation and Localization 


Spatial orientation. An extensive 
investigation has been carried out by 
Witkin et al. (1954) on the interac- 
tion of visual and bodily cues in the 
perception of the vertical at different 
age levels. These authors distinguish 
between two types of tasks: “‘field-as- 
a-whole” tests (e.g, S, in a tilted 
Position, has to adjust a tilted room 
to the vertical) and “part-of-a-field” 
tests (e.g., S, in a tilted room, has to 
adjust his own position to the verti- 
cal). The first type of situation 
yielded no consistent age trends; on 
the other hand, in the part-of-a-field 
tests the Ss’ dependence on the visual 
framework seemed to decrease sub- 
stantially with age. Thus in the 
tilted-room-tilted-chair test the tilted 
condition of the room has progres- 
sively less effect on S’s perception of 
himself as vertical, In interpreting 
this result, however, it is important 
to bear in mind the fact that even in 
the absence of conflicting visual cues 
the adjustment of the body to the 
vertical becomes increasingly accu- 
rate with age (Liebert & Rudel, 1959). 
Witkin’s group likewise found a 
steady decrease with age in the influ- 
ence of a tilted luminous frame en- 
closing a luminous rod in an other- 
wise dark room on the perception of 
the rod as vertical. But in a partial 
replication of this experiment by 
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Edgren (1953) no such age differences 
were found; instead, at all age levels 
(between eight and 16) the judg- 
ments ran counter to the direction of 
the frame. 

A somewhat different emphasis is 
encountered in the work of Wapner 
and Werner (1957) who concern 
themselves with the influence of di- 
rectional sets, induced visually or 
kinesthetically, on S’s orientation in 
space. These authors distinguish be- 
tween three types of effects. The first 
of these involves extraneous stimula- 
tion, and is exemplified in judgments 
of verticality with the body in a 
tilted condition. Here the adult S 
Overcompensates for the bias due to 
the tilt, so that the vertical is dis- 
placed in the opposite direction. But 
this tendency appears to develop 
only relatively late: prior to adoles- 
cence children show only very slight 
effects due to body tilt; for the girls, 
furthermore, the age curve crosses 
over at the youngest age level, where 
the vertical is displaced in the direc- 
tion of tilt. Interestingly enough, 
this same displacement reappears in 
old age (Comalli, Wapner, & Werner, 
1959). Independent confirmation of 
the developmental shift towards in- 
creasing overcompensation for ex- 
traneous stimulation, in the age 
range from early childhood to young 
adulthood, has most recently come 
from a similar study by Liebert and 
Rudel (1959) on the auditory local- 
ization of a moving tone with the 
body in a tilted position. 

A second effect, static object stimu- 
lation, represents a bias due to the 
displacement of the object of judg- 
ment itself. It is illustrated primarily 
in starting-position effects: with the 
rod originally inclined to the left, the 
adjustment to the vertical deviates 
to the left, and vice versa for the rod 
originally inclined to the right. This 


effect decreases fairly regularly with 
age up to the college level, and fur- 
thermore does not increase again in 
old age (Comalli, Wapner & Werner. 
1959). A further experiment reported 
by Wapner and Werner (1957), in- 
volving the adjustment of a luminous 
square to the apparent median plane, 
confirms the decrease with age in 
these starting-position effects, as 
does Liebert and Rudel’s (1959) 
study on auditory localization. 

The third of the effects considered 
by Wapner and Werner, dynamic ob- 
ject stimulation, applies to directional 
sets induced through the presenta- 
tion of objects or words denoting 
particular directions (e.g., hands 
pointing upwards and downwards). 
For these effects the age trends were 
rather less consistent. Two of the 
studies on this problem disclosed, 
however, a developmental shift inde- 
pendent of these directional sets in 
the location of the apparent horizon, 
which was set at progressively lower 
levels with increasing age. 

Spatial and cutaneous localization. 
A clearly related problem is that of 
spatial localization, which has been 
investigated in several developmental 
studies, giving at least indirect sup- 
port to the increased effectiveness of 
the spatial framework. In the visual 
domain, Rey (1955) reports a steady 
decrease in the deviation of the re- 
produced location of a point from the 
true one, following a S-sec. fixation 
period. Furthermore, if an extrane- 
ous stimulus—a point, a straight line, 
or a circle—is drawn on the sheet on 
which the point is to be localized, it 
will, for the younger children, exert a 
strong “pull,” i.e., their localizations 
will be systematically shifted towards 
such astimulus. This effect decreases 
with age, and may even become neg- 
ative in adults. It would be of inter- 
est to determine the role of such 
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stimuli when they can themselves 
serve as referents for the localization 
to be made (i.e., when they are shown 
in the field in which the point is 
originally exposed, as well as in the 
subsequent one). 

Sanstrém and Lundberg (1956) 
have failed to confirm the improve- 
ment in spatial localization with age, 
but since the response they used in- 
volved motor coordination to a much 
greater extent (the point, first shown 
on the top of a table, had to be found 
by sticking a pin on the underside of 
the table), it is scarcely comparable 
with Rey’s. The role of the motor 
aspect is in fact shown by the age 
changes in the direction of the errors 
made: the children tended to “over- 
shoot the mark,” their errors being 
predominantly in the direction oppo- 
site to the hand used for the response. 

Several studies are available on the 
development of cutaneous localiza- 
tion, but the results obtained are not 
too consistent. Dunford (1930) 
studied children between 3 and 15 
years of age; he found only irregular 
changes in accuracy up to age 9; how- 
ever, the largest mean errors were 
made by his 11- and 15-year-olds. 
Renshaw and Wherry (1931) failed 
to find any consistent age differences 
throughout childhood and adoles- 
cence; only his adult sample was 
clearly differentiated from the others, 
their mean errors being highest, The 
authors compare these results with 
those obtained under visual direction 
(i.e., eyes open, rather than blind- 


folded), which show a steady de- 
crease in error, 


Renshaw sees 
gether with those 
(Renshaw, 1930) which a 
adults to be infe 


visual dominance in the Perception 
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of the body. In further support of 
this argument he adduces findings 
from a set of 11 blind Ss—seven older 
children and four adults—showing 
an improvement with age in cutane- 
ous localization (Renshaw, Wherry & 
Newlin, 1930). But the small num- 
ber of Ss used in all of Renshaw’s 
studies (though his data represent 
means of a large number of trials for 
each S), together with the uncertain 
effect of the change in sheer anatomi- 
cal dimensions with age on these ab- 
solute errors suggest caution in the 
interpretation of his results. 


The Perceptual Constancies 


The empiricism-nativism contro- 
versy has provided the impetus for a 
lively interest in the development of 
the constancies, and in particular 
that of size constancy, as well as in 
the related topic of depth perception. 
Some of the Major issues in this area, 
however, confront us equally in the 
rather more compact literature on 
brightness constancy, which thus 


May serve as a convenient starting 
point, 


Brightness. The two most im- 


Portant studies on developmental! 
changes in brightness constancy, by 
Brunswik (1929) and Burzlaff (1931), 
represent sharply divergent positions 
on the question of the evolution of 
the constancies. Brunswik, empha- 
sizing the role of experience in bring- 
ing about the Perceptual achieve- 
ment of the organism reflected in 
constancy, finds a steady increase 
in constancy between ages 3 and 11, 
followed by a slight but consistent 
falling off after that point. This re- 
duction in constancy between early 
adolescence and adulthood, which 15 
confirmed by the data of Walker 
(1927), is interpreted by Brunswik 
(1956) in functional terms, as indicat- 


ing a partial supplanting of percep“ 
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tion by intellectual functions in the 
adult. 

Under experimental conditions 
closely comparable to Brunswik’s, 
Burzlaff (1931) did substantiate the 
increase in constancy with age, at 
least between four and seven years 
(it should be noted that this study is 
limited to 20 children in this age 
range, together with a sample of five 
adults). But when the variable, in- 
Stead of being presented as an iso- 
lated color wheel, was displayed in 
an ordered series of greys on a chart, 
the age trend disappeared, and at the 
Same time constancy improved gen- 
erally. This held true whether a color 
wheel or a random series of greys 
were used for the standard. Further- 
more, Akishige (1937) has shown that 
if color wheels are used for both 
Standard and variable, merely mak- 
ing a chart of greys available does 
not alter the situation: the age trend 
reappears, and constancy generally 
deteriorates, in comparison with judg- 
ments in which the variable itself is 
part of an ordered series. 

The discrepant results found by 
Burzlaff and Brunswik gave rise to 
some heated debate as to the relative 
validity of the two sets of findings. 
Brunswik (cf. note to Klimpfinger, 
1933, pp. 619ff) argued that Burz- 
laff’s situations with serial presenta- 
tion of the variable were too simple 
to allow age differences to manifest 
themselves; Burzlaff (1931) main- 
tained, on the contrary, that Bruns- 
wik’s experimental conditions were 
too artificial and unstructured to be 
meaningful for the children, This 
discussion misses what seems to be 
the main point, namely, the demon- 
strated role of the perceptual situa- 
tion, and more particularly of the 
difficulty of the judgment for the S, 
as a determinant of developmental 
differences. 
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Shape. Wlimpfinger (1933), in a 
study carried out under Brunswik’s 
auspices, found a developmental 
trend rather similar to that found by 
Brunswik for brightness constancy: 
an increase up to about 14 years, fol- 
lowed by a subsequent decline, more 
pronounced than that in the bright- 
ness study: the mean of the older of 
the two adult groups corresponds 
roughly to the 8-year-old level. 
There is, however, some suggestion 
of bimodality in the distribution of 
the values at the older age levels. 

Akishige (1937) has verified the in- 
crease of shape constancy between 
ages 4 and 10. At the same time he 
finds that this age change disappears 
when the variable is shown in a series 
of ellipses of varying eccentricity, 
i.e., under conditions comparable to 
those of Burzlaff for brightness con- 
stancy. Constancy is again generally 
higher under these latter conditions. 

Size. The developmental investi- 
gation of size constancy has been far 
more thorough than for brightness 
and shape, both in terms of the 
manipulation of relevant stimulus 
variables and in terms of the exten- 
sion of the work to infants and pre- 
verbal children. As we shall see, 
however, the methodological prob- 
lems in the way of adequate testing 
of constancy in infants are formid- 
able, and the results obtained are ac- 
cordingly of dubious value. 

Thus, an oft-cited study by Cruik- 
shank (1941) purports to show that 
constancy, though not present at 
birth, appears by about 6 months. 
This conclusion is based on the fact 
that after that age infants ceased to 
respond consistently to a rattle at 
75 cm., while they did respond to 
one equal to it in projective size, at a 
distance of 25 cm. But clearly this 
comparison only indicates discrim- 
ination of depth, i.e., a tendency to 
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ignore objects that are relatively far 
away. Indeed, if this method allowed 
any inference concerning constancy, 
one would have to conclude that it 
was absent altogether, since re- 
sponses to a rattle at 75 cm. equal in 
physical size to the one shown at 25 
cm. were infrequent at any age level, 
and decreased steadily after 15 weeks. 
Cruikshank’s procedure appears 
inherently unsuited to answer the 
question of constancy, since only a 
single rattle was shown at any given 
time. More recently, Misumi (1951) 
has approached the problem through 
a stimulus preference method, with 
somewhat different results. The main 
feature of his method is his determi- 
nation of the extent of the S’s spon- 
taneous preference for one of two ob- 
jects differing in size, when presented 
at the same distance from the S. This 
is then used as a baseline in the evalu- 
ation of the results for pairs of ob- 
jects of which one is removed in 
depth. Thus, from the age of five 
months on, 77% of the responses are 
made to the larger of two goldfish; 
when it is moved back so as to be 
equal in retinal size with the smaller 
One, the choices of the larger drop to 


it in depth. 

More satisfactory 
ous results can be obt 
verbal Ss by first establishing a stable 
differential response to size and then 
testing its maintenance as the dis- 


and unambigu- 
ained from non- 
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tance of the stimuli is manipulated. 
This approach has been applied 
(Beyrl, 1926; Frank, 1925), though 
not in very systematic fashion; 
Frank’s study is nevertheless of in- 
terest in demonstrating that one- to 
two-year-old children can consistently 
adhere to their size discrimination 
even with the larger stimulus as far 
as 15 m. removed from the smaller 
one. Beyrl reports, however, that an 
attempted replication of Frank’s 
work resulted in a considerably 
higher incidence of errors. 

The major portion of Beyrl’s 
(1926) investigation consists of a 
psychophysical study of size con- 
Stancy at different age levels, which, 
in contrast to the work reported here- 
tofore, yields quantitative measures 
of the degree of constancy attained. 
His Ss were 23- to 11-year-old chil- 
dren and a group of adults; they were 
Slven cubes, as well as discs, to judge 
as to size by a method of constant 
stimuli, with the standard at 1 m. and 
the variables at distances from 2 m. 
to11 m. The age curves for the judg- 
ments at the separate distances of 
the variable all show a steadily de- 
creasing tendency to underestimate 
the far stimulus, becoming negligible 
by age 11: as distance increases, 
furthermore, the younger Ss lag 
further and further behind the older 
ones in their approach to constancy. 
This is true both for the cubes and 
for the discs, but to an apparently 
greater extent for the latter. 

Although these results clearly show 
that size constancy is far from per- 
fect at the younger ages, this study 
has been cited Tepeatedly as demon- 
strating nearly 100% constancy ever 
at the lowest age level. The basis for 
this misstatement is the graph 
through which Brunswik has com- 
pared this study with others of color 
and shape constancy (see Klimp“ 
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finger, 1933; also Brunswik, 1956, p. 
83); and which indicates a size-con- 
stancy ratio as high as .92 at the age 
of two and one-half years. This figure 
is, however, in large part an artifact 
of the arithmetical properties of the 
Brunswik ratio, which is virtually in- 
sensitive to any except very drastic 
errors at longer distances, as re- 
flected in the paradoxical finding that 
the values of this ratio actually in- 
crease with distance for the younger 
children! If we add to this observa- 
tion the disquieting though generally 
ignored fact that the results from 19 
of the 55 preschool children had to 
be.discarded, as the judgments were 
too inconsistent to allow the calcula- 
tion of a PSE, the departure from 
constancy in the judgments at these 
lower age levels is seen to be quite 
considerable. 

While virtually all of the subse- 
quent studies on size constancy have 
substantiated this increase in con- 
Stancy with age, several variables 
appear to be important determiners 
of the extent and even the presence 
or absence of this age trend. Thus, 
Just as in the case of shape and 
brightness constancy, the presenta- 
tion of the variable in an ordered 
series tends to equalize the perform- 
ance of the different age groups, and 
improve constancy generally. This 
was shown by Burzlaff (1931) in an 
exploratory study modeled after his 
brightness-constancy work discussed 
above. On the other hand, in a re- 
versal of this situation—variable 
shown singly and standard in an 

. ordered series—Akishige (1937) did 
find a marked age trend between 4 
and 10 years, suggesting that Burz- 
laff’s effect is more specifically con- 
nected with the judgmental process 
involved in the selection of the vari- 
able to be equated to the standard. 

This is in fact the position taken by 
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Lambercier (1946a), in a paper which 
undoubtedly presents the most ex- 
haustive and searching analysis avail- 
able of methodological factors in the 
size-constancy problem. He believes 
that Burzlaff’s and Akishige’s results 
are to be attributed to the operation 
of a central tendency effect, already 
referred to in the methodology sec- 
tion, which is especially marked in 
the children, systematically displac- 
ing their judgments towards the mid- 
dle of the series of variables. Where, 
as in Burzlaff’s case, the standard is 
equivalent to this middle, the net 
effect would be to counteract any 
constant errors due to imperfect con- 
stancy. In the experimental portion 
of his study, Lambercier presents 
data which seem to bear out his 
thesis: as the center of the series of 
variables is shifted away from the 
value of the standard, the errors are 
systematically displaced in the direc- 
tion of this center, but to an extent 
decreasing with age. Unfortunately 
the more eccentric variable series led 
at times to refusals on the part of S 
to admit any stimulus as equivalent 
to the standard, with awkward con- 
sequences not only for the calcula- 
tion of PSE’s, but particularly for the 
validity of the age comparisons, since 
such refusals occurred chiefly in 
adults. Furthermore, Lambercier’s 
hypothesis cannot account for Burz- 
laff’s brightness constancy data, since 
there the standard was generally ec- 
centric with respect to the series of 
variables. It may be noted that 
Lambercier finds consistent increases 
in constancy across a variety of other 
experimental conditions; again de- 
velopmental changes were most pro- 
nounced under single presentation 
of standard and variable, although 
they were attenuated due to practice 
effects upon repetition of the single- 
variable trials following judgments 
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under other, apparently easier condi- 
tions. 

A further study by Lambercier 
(1946b) provides information on the 
role of reference stimuli in the visual 
field. A number of sticks were placed 
horizontally or vertically along the 
line of sight between variable and 
standard. The only condition ma- 
terially affecting the size judgments 
was one in which the vertically 
placed reference sticks were all of 
equal size, and equal to the standard 
as well, thus enhancing linear per- 
spective cues. The beneficial effect 
of this condition increased with age; 
however, since the equivalence of the 
reference sticks to the standard was 
demonstrated to the Ss, this may ac- 
tually represent more a cognitive 
than a strictly perceptual perform- 
ance; the fact that 94% of the adult 
judgments were exactly correct sug- 
gests as much. 

An additional factor relating to 
the stimulus conditions which has re- 
ceived some attention in size con- 
stancy research is that of the hori- 
zontal separation between the stim- 
uli. Frank (1927), commenting on 
Beyrl's (1926) attempted replication 
of her earlier study (Frank, 1925), 
attributed the relatively poor per- 
formance of Beyrl’s Ss to the smaller 
horizontal separation between the 
two stimuli, which she believed 
would work against the perception of 
their objective size in young chil- 
dren, Comparing additional results 
obtained both under Beyrl's condi- 
tions of horizontal Separation and 
under greater Separation—the extent 
1s not specified—she obtained in fact 
superior results for the three- and 
four-year-old Ss under the latter con- 
ditions, though little difference for 

older children. 

Piaget and Lambercier (1943b) 
have investigated this problem more 
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systematically; they find an over-all 
increase of the average error under 
wide lateral separation between 
standard and variable, but little age 
difference in this respect. But the 
lateral separation used was rather ex- 
treme; in fact, depth and lateral sep- 
aration were confounded here, as the 
variable was presented 3 m. to the 
side of the standard, with S directly 
behind one of the two. 

A much more important factor ap- 
pears to be the position of the 
standard and variable relative to the 
S. Both Akishige (1937) and Piaget 
and Lambercier (1943b, 1951a, 
1956a) find higher degrees of con- 
stancy when the standard is in the 
far position. In fact, Piaget and 
Lambercier find Overconstancy at all 
age levels in this situation, while this 
appears only after the age of 8 years 
when the standard is near (cf. Piaget 
& Lambercier, 1956a, Table 4, p. 277). 
This effect represents of course a spe- 
cial case of the “error of the stand- 
ard” discussed in the methodology 
section. (It may be noted that both 
Burzlaff [1931] and Akishige [1937] 
find a similar effect in brightness 
constancy: when the standard is in 
the dark, constancy is considerably 
enhanced.) The differential effect of 
this error at different age levels does 
not emerge very clearly from these 
studies, but it is obviously a factor to 
be taken into account in interpreting 
a given constant error in terms of its 
absolute value. 

Finally, the variable of distance 
needs to be considered. First of all, 
the relative distance between the 
stimuli has recently been shown to be 
a significant determinant of develop- 
mental differences. Cohen, Hersh- 
kowitz, and Chodack (1958) found a 
significant increase in constancy be- 
tween 5 and 17 years when a standar 
at 2 m. was compared to a variable 
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at 8 m.; when the former was moved 
back to 6 m. from S, the age dif- 
ferences disappeared. As might be 
expected, there is a general reduction 
of error in this situation, and cor- 
respondingly less room for develop- 
mental change. For both situations, 
however, these authors found a sig- 
nificant reduction with age in the in- 
terval of uncertainty (i.e., the width 
of the DL), which is confirmed in the 
Studies of Lambercier (1946b) and 
Piaget and Lambercier (1943b), for 
similar conditions involving single 
Presentation of the variable. 
Perhaps a more interesting factor is 
the role of the absolute distance of 
the stimuli from S. Its investigation 
would throw light on the develop- 
mental aspects of the so-called sec- 
ondary cues to distance emphasized 
by Gibson (1950). This question has 
received little study thus far, how- 
ever. Zeigler and Leibowitz (1957), 
in an experiment modeled after the 
classical study of Holway and Boring 
(1941), did determine size constancy 
at distances up to 100 feet, in an in- 
door setting. The results, from only 
13 Ss (eight boys aged seven to nine 
and five adults) showed that the 
judgments of the boys conformed to 
the law of the retinal angle to a sur- 
prising extent, whereas the adults, in 
agreement with Holway and Boring’s 
findings for full-cue-conditions, show 
overconstancy (at least up to 60 feet). 
Developmental changes in size 
constancy in a Gibsonian, full-tex- 
ture field are the subject of one of the 
experiments of the thesis by Edgren 
(1953), His stimulus materials con- 
sisted of a series of photographic 
slides on which a standard stick was 
displayed at a distance in an empty 
field, with a series of variables in the 
foreground (cf. Gibson, 1950, pP- 
184f), Distance and size of the stand- 
ard varied from slide to slide, but the 


results are given only for the com- 
bined series; they indicate only very 
irregular age changes between 8 and 
16 years. If the central-tendency 
effect postulated by Lambercier 
(1946a) under these conditions of ser- 
ial presentation of the variable is a 
real one, the size of the standard may 
have been an important factor. 

One analysis which provided Ed- 
gren with significant age differences 
in this experiment concerned judg- 
ments on two especially prepared 
slides in which the context had been 
masked completely. The effect of 
the loss of cues proved relatively 
greater for adults, which is just the 
opposite of what one might expect. 
Again central tendency effects, as 
well as more general uncontrolled 
variables (all of Edgren’s experi- 
ments were group-administered) play 
unknown roles here. 

Although not varying distance 
systematically, a study of Dukes 
(1951) is of incidental interest here, 
since it represents an adaptation of 
Brunswik’s (1944) “ecological repre- 
sentativeness” approach, in which S 
judged the size of a large number of 
objects in his natural environment, 
over a large range of sizes and dis- 
tances. The results for a single six- 
year-old child actually yield a higher 
correlation between judged and ob- 
jective size (.99) than that obtained 
by Brunswik for an adult for a com- 
parable set of stimulus objects (.95). 
An important difference in the two 
studies, however, is Dukes’ use of 
comparative judgments, as opposed 
to the absolute-judgment procedure 
of Brunswik. 

Rounding out this review of the 
work on size constancy are a few 
studies - employing instructions for 
projective- (retinal-) size matches. 
Piaget and Lambercier (1951a, 1956a) 
have made an extensive study of suc 
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judgments in their most typical con- 
stancy situation (standard at 1 m., 
variable at 4 m. or vice versa). The 
rather surprising result was that 
the younger children, aged seven to 
eight, provided they could under- 
stand the instructions at all, per- 
formed considerably better than the 
10- to 12-year group, and even some- 
what better than the adults. Thus 
there was a U shaped trend here, as 
in Wiirsten’s (1947) study on the 
horizontal-vertical illusion. The au- 
thors suggest that for the youngest 
children there is less interference 
from the perception of the true size 
of the distant object, due to their 
tendency to perceive objects inde- 
pendently of the spatial framework. 
With increasing age this framework 
becomes increasingly determining, 
so that projective size becomes pro- 
gressively harder to abstract, until 
the intervention of cognitive mech- 
anisms mediating the perception 
of perspective relationships become 
available (i.e., in the adult). What- 
ever the merits of this explanation, 
the superiority of the youngest group 
remains somewhat mysterious, par- 
ticularly since they were at the same 
time the most accurate in their objec- 
tive-size matches (the older Ss ex- 
hibited Overconstancy). It should be 
Pointed out, incidentally, that even 
the best judgments of projective size 
fell far short of the actual projective 
match; at all ages there was thus a 


strong effect due to objective stim- 
ulus size, 


The problem of 
mates has been att. 


nell (1935), 
aged 9 to 12 


photographed 


the photographs were viewed both 
under ordinary and stereoscopic con- 
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ditions. In general, the instructions 
were apparently effective in leading 
to projective-size judgments, as re- 
flected in constancy ratios between 
-18 and .25. The children showed a 
greater influence of the actual, as Op- 
posed to the photographic size of the 
objects, in agreement with the find- 
ings of Piaget and Lambercier for a 
comparable age range. At the same 
time, as regards the change in judg- 
ment with increasing distance, the 
results for the children were betwee? 
those for the adult men and women. 

Distance-constancy. In his discus- 
sion of the size-constancy problem, 
Gibson (1950, pp. 163ff) points out 
that not only the size of the objects, 
but the distances between them tend 
to remain constant as the individual 
moves through his normal environ- 
ment. Massucco Costa (1949) has 
explored this question developmen- 
tally, asking S to chose a variable 
interval at 4 m. equal to a standard 
interval marked off at 1m. The in- 
tervals were either empty (delimited 
by cubes or rulers) or filled. A de- 
velopmental trend was in evidence 
only for the empty-interval judg- 
ments, which fell considerably short 
of constancy for five- and seven-year- 
old children (though not so much for 
the six-year-old group), while older 
children and adults exhibited nearly 
complete constancy, 

A similar study by Denis-Prinz- 
horn (1959) is of particular interest, 
since it focuses on the question of 
the correlation between size and dis- 
tance perception. In her experiment 
—which forms Part of a dissertation 
currently in progress at Geneva—S$ 
made judgments both of the relative 
size of two sticks at 1 m. and 4m 
and of relative distance over the 
same stimulus field (the latter by 2 
bisection procedure). Denis-Prin2- 
horn finds a developmental tren 
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from under- to overconstancy for 
both the size and the distance judg- 
ments. More important is her demon- 
Stration that these two judgments, 
which are virtually uncorrelated for 
adults—thus confirming similar re- 
sults obtained by Gruber (1954)—are 
on the contrary highly correlated for 
young children (five to seven years). 
This finding suggests that cues which 
in adults operate relatively inde- 
Pendently in the two situations are 
much more closely interrelated in the 
child. Exactly which cues are in- 
volved in this developmental change 
remains to be determined, however. 

_ Concluding remarks on the constan- 
cies. The general picture with respect 
to the development of the constancies 
may perhaps be summarized as fol- 
lows: some “regression to the real 
object”—to use Thouless’ (1931) ex- 
Pression—appears to develop very 
early in life, probably by the end of 
the first year, although most of the 
data on this question are unreliable. 
Under conditions in which constancy 
is enhanced generally (e.g., serial pres- 
entation of the variable; short dis- 
tances between standard and vari- 
able), the progress of development is 
almost complete at a fairly early age, 
l.e., five or six years. Further in- 
creases in constancy occur through- 
out later childhood and adolescence 
in relatively impoverished laboratory 
situations and with single presenta- 
tion of the variable; this trend typi- 
cally culminates in overconstancy in 
the adult. 

It appears, then, that the young 
child requires a greater variety of 
cues than the adult in order to main- 
tain invariance in his perception. 
This conclusion is also in conform- 
ance with the reduction with age in 
the correlation between size and dis- 
tance judgments (Denis-Prinzhorn, 
1959) and with the apparently 
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sharper rise in constancy up to early 
adolescence for shape and brightness 
constancy, where the available cues 
are not so manifold, as compared 
with size constancy (Klimpfinger 
1933). One among the many arene 
lems that remain to be clarified in 
this regard, however, is the meaning 
of the decrease in shape and bright- 
ness constancy after early adoles- 


cence. 


Depth Perception 

While depth perception is clearly 
implied in the above-mentioned study 
on distance judgment by Denis- 
Prinzhorn (1959), the discrimination 
of two stimuli as relatively nearer or 
farther represents a somewhat differ- 
ent problem, and one less obviously 
related to size constancy. Gross dif- 
ferentiation of depth appears to de- 
velop quite early, according to the 
recent work by Walk and Gibson 
(1959), who have demonstrated that 
6- to 14-month-old infants will rather 
consistently avoid a “‘visual cliff” 3 
m. deep, on the basis of perceptual 
cues to depth. As we saw in our dis- 
cussion of size constancy, the results 
obtained by Cruikshank (1941) like- 
wise suggest strongly that after the 
age of six months infants regularly 
respond to the nearer of two objects 
when their separation in depth is 
considerable. 

As for more finely differentiated 
judgments of the relative distance 
of two objects, an extended investiga- 
tion by Updegraff (1930) shows that 
this type of discrimination is rather 
well developed already in four-year- 
old children, who are only slightly 
inferior to adults. This applies both 
to short (5 m.) and long (195 m.) dis- 
tances of the objects from the ob- 
server. Furthermore, neither the 
children nor the adults show any ap- 
preciable differences between monot- 
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other hand, under the presumably 
more difficult conditions of a Bour- 
don-type judgment (the Ss, looking 
through a small aperture, judged 
whether a ball dropped in front or be- 
hind a partition), the children were 
consistently less accurate than the 
adults. 

This last type of judgment is of 
course a function of retinal disparity. 
Developmental changes in the effec- 
tiveness of this cue are shown in sev- 
eral studies of stereoscopic per- 
ception (Carr, 1935; Gesell, Ilg, 
& Bullis, 1949; Leyer, 1939). Carr 
(1935) reports that below the age of 
three years children were unable to 
perceive depth in stereoscopic pic- 
tures, even after a period of training; 
Gesell, Ilg, and Bullis (1949) find 
similar difficulties up to the age of 5. 
But, as Munn (1955, p. 269) points 
out, these results are of equivocal 
significance, in the absence of ade- 
quate control over S’s ability to focus 
effectively stereoscopic images. That 
this factor may play an important 
part is indicated by the contrasting 
results obtained by Johnson and 
Beck (1941), These authors utilized 
stereoscopic slides projected as polar- 
ized images on a screen (i.e., as in 
“3-D” motion pictures), which the 
child viewed through polaroid glasses. 
If the projected image of the object 
were perceived in depth, it would be 
localized at a point between Sand the 
screen. „Although the criteria for S's 
perception of depth are not made too 
explicit, the authors’ findings seem to 
indicate that even the youngest „Ss 
(aged two years) did Perceive the im- 
age three-dimensionally, 

While the foregoing studies in- 
volved images of actual three-dimen- 
sional objects, Leyer (1939) used as 
his stimulus materials a set of stereo- 
scopic geometrical drawings, shown 
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under tachistoscopic exposure. In 
an 8- to 13-year-old group only 40% 
of the drawings were perceived three- 
dimensionally, as against 60% in a 
15- to 19-year-old group. These two 
groups likewise responded differently 
to a task in which a vertical line, 
viewed stereoscopically, was to be 
adjusted so as to be midway between 
two other parallel lines. If the lines 
were all perceived in the same plane, 
the adjustment would naturally dif- 
fer from that made if the middle line 
were perceived as displaced in depth. 
Here again a somewhat larger pro- 
portion of the older Ss responded in 
terms of depth. 

Lastly, we may consider a rather 
different aspect of depth perception, 
namely the three-dimensional in- 
terpretation of two-dimensional fig- 
ures. Very little work has been car- 
ried out on this question develop- 
mentally. The Gestaltists’ assump- 
tion of the intrinsic determination of 
such depth effects receives some in- 
direct support, however, from studies 
by Slochower (1946) and Ishii (1956). 
The latter asked for descriptions of 
Y and T shaped drawings, by Ss 6 
and 14 years of age; no differences 
were found in the responses given by 
the two groups. Slochower used a 
wider range of stimuli, though less 
systematically constructed, again 
finding little age differences in the 
range from five to nine years in SS 
reproductions of the designs by means 
of modelling clay. In both of these 
studies the stimulus characteristics 
appeared to be the primary determ- 
inants of the perception of depth. 


Form Perception 


Form discrimination in infancy. ce 


in the case of the constancies, the 
question of the Perception of form at 
very early age levels has attracted @ 
fair amount of attention, To antici 
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pate the main trend of the findings 
to be presented, it seems that, while 
form discrimination can be demon- 
strated fairly early in infancy, it is 
not a very potent aspect of the stim- 
ulus in the perception of young chil- 
dren, 

Ling (1941), by utilizing a dis- 
crimination-learning procedure in 
which the stimuli (sugar-coated three- 
dimensional forms) coincided with 
the rewards, was able to show dis- 
crimination between various simple 
geometrical shapes as early as six 
months; the differential response was 
furthermore little affected by changes 
in size or orientation of the correct 
figure. Of particular interest is Ling’s 
finding that an initial difficult dis- 
crimination between a circle and an 
ellipse led to greater transfer when 
the ellipse was replaced with angular 
objects than that found for the re- 
verse order (circle vs. square, fol- 
lowed by circle vs. ellipse). 

This study thus shows that the 
mechanisms for shape discrimination 
are functional at an early age, or at 
least can become so following expo- 
Sure to a limited amount of specific 
training (it should be noted that in 
this experiment, which extended over 
a period of several months for any 
given infant, learning and matura- 
tional factors are to some extent con- 
founded). A probably important 
contributing factor to such early dis- 
crimination is the use of three-dimen- 
sional objects, since size discrimina- 
tion is typically better and develops 
earlier for three-dimensional as com- 
pared with equivalent plane figures 
(Stevenson & McBee, 1958; Volkelt, 
1926; Welch, 1939b). 

Riissel (1931) compared a variety 
of cues (contour vs. solid form, size, 
thickness, round vs. pointed form 
and symmetry) with respect to speed 
of learning of a discrimination and 
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TE vane ae ae 
ber of Ss (24 en ps cue. The num- 
to five ears), w 5 gerant AR a 

x zas too small to allow 
systematic | age-group comparisons, 
but it is of interest that, although the 
cues were all roughly equivalent as 
regards the ease of the original dis- 
crimination, the transfer was consid- 
erably less for the last three cues— 
representing aspects of shape—than 
for the first two. Transfer trials in 
which pairs of cues were confronted 
with each other likewise showed con- 
tour vs. surface to be the most potent 
and round vs. sharp and symmetry 
to be the least potent cues at this 
general age level. 

That a preferential response to the 
contour vs. solid-form cue as opposed 
to shape represents a relatively prim- 
itive stage of perceptual development 
is further indicated by an investiga- 
tion by Knoblauch (1934), compar- 
ing mentally deficient Ss, aged 7 to 
21, with normal children, aged four 
and one-half to eight and one-half. 
Following an initial discrimination 
between a solid circle and various 
angular forms, the mentally defi- 
cient group was much more likely 
than the normals to choose solid fig- 
ures, regardless of shape, in transfer 
trials, while the normals tended to 
respond on the basis of the round vs. 
angular shape of the stimuli. 

Preferential choices on transfer 
trials of a stimulus-equivalence ex- 
periment are of course of equivocal 
significance as regards S’s perception 
of the various aspects of the stimuli. 
Yet the genetic primacy of the solid- 
vs. outline-form cue is of considerable 
interest, particularly in relation to 
Hebb’s theory of perceptual develop- 
ment (1949), which postulates that 
segregation of figure from ground is 
present at birth, but that the per- 
ception of form requires, at least 1” 
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the human, a fairly extensive period 
of sensory learning. - 

The role of spatial orientation in the 
discrimination of form. As noted 
above, Ling (1941) found that the 
discrimination of his infants was little 
affected by changes in the orienta- 
tion of the stimuli; this was true also 
in Gellermann’s (1933) work with 
slightly older children. As a matter 
of fact, this seems part of a general 
tendency of young children to ignore 
spatial orientation in their response 
to form, which was observed man: 
years ago by Stern (1909), both in 
the drawings of children and in their 
identification of familiar objects. 

There are really two aspects to 
this problem: a positive and a nega- 
tive one. The first refers to an actual 
superiority of children over adults in 
recognizing stimuli under changes in 
orientation, at least relative to their 
performance under normal condi- 
tions. This superiority has been 

found both for the recognition of in- 
verted pictures (Mouchy, quoted by 
Köhler, 1940, p. 20f) and of non- 
sense words turned through 90° 
(Oetjen, 1916). The results of New- 
hall (1937), showing that the per- 
formance of three- to five-year-old 
children in a visual acuity task is not 
influenced by either right-left or up- 


down reversals, point in the same 
direction. 


The other side of t 


young children experience great diffi- 
culty in discriminating between 
forms which are mirror-images or up- 
down reversals of one another, This 


is a well-known fact to students of 
the early devel 


his coin is that 


o , 1935; Ver- 
non, 1957).  Rice’s (1930) study, 


sometimes cited in this connection, is 
somewhat equivocal, since it only 
showed a lack of spontaneous ver- 
balization of the inversion of a figure 
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in a matching task on the part of the 
younger children (four to five years). 
Similarly ambiguous is Burkhardt’s 
(1925) finding of a sharp decrease be- 
tween six and eight years in the inci- 
dence of orientational changes in the 
child’s reproductions of a set of sim- 
ple drawings. But there is more come 
pelling evidence in an unpublished 
thesis by Newson (quoted by Ver- 
non, 1957, p. 25), indicating an out- 
right inability of five-year-old chil- 
dren to detect the difference between 
a shape and its mirror image, even 
when it is pointed out. Upside-down 
reversals caused somewhat less diffi- 
culty for these children than mirror- 
image reversals—a_ point most re- 
cently confirmed in a discrimination- 
learning experiment by Rudel (1959), 
with three- to five-year-old children. 

The work considered thus far has 
dealt in the main with simple geo- 
metric forms, or individual familiar 
objects. A paper by Hunton (1955) 
throws an interesting sidelight on 
this question, for this author reports 
that in the description of complex 
meaningful pictures the inversion of 
the stimulus has the effect, up to age 
seven, of interfering with the per- 
ception of the relationship between 
the objects. 

It may be pointed out in conclu- 
sion that this evidence on the weak 
role of the spatial orientation in 
young children seems to support the 
views of Wiirsten and Piaget, ad- 
vanced to explain the developmental 
changes in the horizontal-vertical 
and other related illusions (cf. above). 
The evidence for improvement in 
spatial localization, reviewed in an 
earlier section, also fits rather well 
into this general picture. 

Gestalt principles. Followers of the 
Gestalt school of Perception have, in 
general, assumed an innate, intrinsic 
determination of their principles of 
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organization in the perception of 
form (cf. Zuckerman & Rock, 1957), 
and have accordingly not concerned 
themselves very intensively with the 
study of developmental changes. 
Piaget, among others, has taken ex- 
ception to this view, arguing that the 
full-fledged operation of these princi- 
ples is achieved only as the end-re- 
sult of a prolonged developmental 
Process. One particular study in 
which he has attempted to substanti- 
ate this claim (Piaget, Maire & 
Privat, 1954) consists of a compari- 
son of the ordinary Miiller-Lyer illu- 
sion, in which the two figures are 
shown superimposed vertically, with 
that resulting when the ends of the 
arrows are joined by vertical lines, so 
as to form a square in conjunction 
with the main lines of the Müller- 
Lyer figures (cf. Fig. 3). As one 
might expect, the connecting lines 
greatly reduce the illusion, in ac- 
cordance with the stabilizing proper- 
ties of a “good figure” such as the 
Square. This effect is, however, rela- 
tively smallest for the youngest chil- 
dren, and greatest in adults, thus 
demonstrating, according to the au- 
thors, the increasing efficacy with 
age of these stabilizing properties. 
But the fact that the absolute amount 
of the difference between the illusions 
under the two conditions decreases 
with age somewhat weakens the force 
of this argument. On the other hand, 
the decrease with age in both inter- 
and intrasubject variability found 
for the modified figure, which con- 
trasts with the increase in variability 
with age found for the ordinary 
Miiller-Lyer figure, does indicate an 
increase in consistency which would 
support Piaget’s point. 

For some of the other principles of 
organization—that of continuation 
in particular—developmental changes 
are similarly demonstrable. A major 


Fic. 3. Modification of Miiller-Lyer figure, 
showing role of ‘‘good figure.” 


source of evidence on this problem is 
the investigation by Rush (1937), 
who has solved the difficult problem 
of quantification in this area with 
notable success (besides undoubtedly 
establishing a record for this type of 
experimentation as regards the total 
number of Ss tested: 8589!). Thus, 
in order to assess the strength of the 
proximity factor, this author pre- 
sented her Ss with matrices of dots, 
varying the spacing of rows or col- 
umns and asking them to indicate by 
straight lines the direction in which 
they perceived the dots. No appre- 
ciable age changes emerged with re- 
spect to the role of this factor be- 
tween the first grade and high-school 
levels—a finding which is in agree- 
ment with that noted by Schroff 
(1928), on the basis of rather less im- 
pressive data. 

More interesting yet in Rush’s 
study is her confrontation of various 
factors with each other. The author 
opposed the proximity factor to that 
of continuation (manipulated in terms 
of the number of dots in the rows and 
columns of the matrix), as well as to 
that of similarity (manipulated by 
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presenting elements differing in shape 
such that the rows were homogeneous 
and the columns heterogeneous, or 
vice versa). In both cases the influ- 
ence of proximity decreased with age, 
relative to that of the other factor. 
Finally, juxtaposing the factors of 
continuation and similarity, Rush 
found an increase in the role of con- 
tinuation, although this age trend 
was reversed in the case of the col- 
lege group (similar anomalous results 
for the college-age Ss are in evidence 
throughout this report). 

Of Rush’s findings perhaps the one 
of most general significance is the in- 
crease in the effect of continuation 
with age, at least up to high school. 
Bearing in mind that this factor is 
represented here through discontinu- 
ous stimulus patterns—i.e., rows and 
columns of discrete dots and the like 
—it is tempting to relate this trend 
to the increase with age in the ability 
to achieve closure on the basis of in- 
complete figures, as shown in de- 
velopmental studies on the recogni- 
tion of pictured objects from partial 
cues (Brenner, 1957; Gollin, 1959a; 
Mooney, 1957; Schober & Schober, 
1919; Van der Torren, 1907). (The 
normative data obtained by Street 
[1931] for his “Gestalt Completion” 
test from third-grade, sixth-grade, 
and high-school children fail to show 
appreciable age differences, possibly 
due to the relatively small number of 
stimuli employed.) 

e might also mention in this con- 
nection Gollin’s (1959b) work on the 
discrimination of tactual stimuli 
formed by patterns of tacks. This 
task is not only substantially more 
difficult for eight- to 10-year-old chil- 
dren than for adults, but the children 
are handicapped to a much greater 
extent by the presence of “noise,” 
i.e., background stimuli not forming 
part of the patterns, in the field of 
touch. 
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If continuation in discontinuous or 
incomplete patterns becomes thus 
fully effective only at a rather late 
age, the role of this factor in the per- 
ception of uninterrupted line-draw- 
ings is on the contrary very strongly 
developed in young children (Ghent, 
1956; Piaget & von Albertini, 1954). 
This difference between continuous 
and discontinuous stimuli is shown 
clearly in the study by Piaget and 
and von Albertini (1954); four- to 
five-year-old children experience 
virtually no difficulty in tracing the 
outlines of forms when these over- 
lapped with other forms. However: 
when these same forms were pre- 
sented individually, but as dashed 
patterns, children up to the age of six 
could generally not recognize them; 
success in tracing these dashed forms 
superimposed as in the overlapping 
figures did not come until a still later 
age (between seven and nine years). 
Similarly, the completion of a par- 
tially mutilated geometrical figure 
also develops relatively late (i.e. 
about seven years) according to this 
study, these gaps proving especially 
difficult where oblique lines or angles 
are involved, Here immature comple- 
tions, instead of conforming to prin- 
ciples of simplicity, tended frequently 
in the direction of “empirical,” i.e- 
familiar, Gestalten. 

These results thus suggest the im- 
portance of distinguishing between 
the role of closure in the perception 
of incomplete stimuli and the role of 
“good continuation” when continu- 
ous line patterns are involyed. With 
respect to the latter, Ghent (1956) 
has confirmed its effectiveness 1n 
children as young as four years in the 
recognition of overlapping forms. On 
the other hand, in a second portion of 
Ghent’s study, the recognition of em- 
bedded figures of the Gottschaldt 
type proved much more difficult for 
young children, giving rise to # 
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marked improvement between 4 and 
13 years. This improvement is cor- 
roborated by other developmental 
studies of the perception of embedded 
figures (Gollin, 1956; Witkin et al., 
1954), as well as by the similar results 
found by Heiss (1930) with respect to 
the ability to detect an angular shape 
when it forms part of a continuous 
outline. 

The evidence reviewed here sug- 
gests, then, that young children have 
a very strong tendency to follow what 
may be loosely described as a “Jine 
of least resistance,” i.e., a continuous 
linear path, in their perception of 
complex linear forms. This tendency 
clearly serves them in good stead in 
the recognition of overlapping forms, 
while it interferes with the isolation 
of component parts of a total outline 
(as in Heiss’s study), and more 
particularly with the perception of 
angular forms competing with the 
continuous line pattern, as in the 
Gottschaldt figures. In this connec- 
tion a study of developmental 
changes in eye movements in the vis- 
ual exploration of such figures should 
prove revealing. At the same time 
the application of information-theory 
principles (cf. Attneave, 1954) sug- 
gests itself as the means for a more 
systematic attack in the study of 
these developmental changes. 

Whole vs. part perception. Some of 
the problems just discussed, notably 
in the consideration of embedded 
and overlapping figures, are closely 
related to the question of the per- 


ception of part as against whole as- 
Studies of 


pects of complex forms. 
age changes on the Rorschach repre- 
sent our main source of information 
concerning the developmental proc- 
esses involved here. They generally 
agree in finding a change from a pre- 
Ponderance of global percepts, de- 
termined by an unstructured per- 
ception of the whole in early child- 


273 


hood, through a phase in middle 
a 

: ate, to an eventual 
more integrated perception of the 
whole in the adult (Ames, Learned 
Metraux, & Walker, 1953; Dworetzki, 
1939; Hemmendinger, 1953). ` 

In view of the lack of control over 
either the stimulus or the response in 
the Rorschach test, and the conse- 
quent equivocal significance for per- 
ceptual processes of the S’s interpre- 
tations of these blots, it is of some im- 
portance that this formulation of the 
developmental changes in part-whole 
perception has been substantially 
confirmed in several other studies 
which permitted a more systematic 
confrontation of part and whole as- 
pects. Foremost among these is the 
investigation of Dworetzki (1939) 
which, besides the Rorschach cards, 
employed meaningful designs made 
up of individual parts which them- 
selves represented meaningful ob- 
jects. Here the trend in the Ss’ 
identifications of the designs par- 
alleled rather closely that found in 
the Rorschach studies: the youngest 
children responded merely to the de- 
sign-as-a-whole, while somewhat 
older children responded mainly to 
the component parts; adolescents 
and adults, finally, included both of 
these aspects in their descriptions of 
the stimuli. 

A further study by Selinka (1939) 
represents perhaps a still more ade- 
quate approach to the question of the 
integration of the parts in the per- 
ception of the whole. This author 
constructed a task similar to the 
Kohs Block-Design test, in which a 
particular circular pattern shown to 
S had to be reproduced by means of 
four blocks, each of which containe' 
a quarter of a circle made up of dif- 
ferent kinds of patterns on the dif- 
ferent sides. Thus the extent to 


which the pattern of the model was 
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taken into account in the arrange- 
ment of the blocks showed the child’s 
attention to detail in his reproduction 
of the total circular outline. This 
task showed a considerable improve- 
ment from the kindergarten level, 
where the component patterns were 
usually ignored, through successive 
school-age groups up to 10 years. 
Acutally, the major portion of this 
improvement was from the seven- to 
the eight-year-old level, suggesting 
the possible influence of specific ex- 
perience in school. 
Various other studies (Segers, 
1926; Smith, 1914) present further 
evidence regarding the difficulty ex- 
perienced by young children in per- 
ceiving small details in complex but 
meaningful wholes. However, when 
the whole does not itself represent a 
Strongly competing aspect of the 
stimulus, even young children can 
respond to details. This is shown in 
an experiment by Van der Torren 
(1907), who presented children with 
drawings of familiar objects at pro- 
gressively increasing levels of com- 
pleteness; at each trial they not only 
tried to identify the object, but also 
to point out the parts in the picture 
that had been added since the previ- 
ous exposure. In line with the above- 
mentioned age trends for incomplete 
figures, the identifications improved 
with age (i.e., in terms of the degree 
of completion required). Yet the 
Tecognition of the added details was 
almost perfect from the youngest age 
level (four Schober and 
however, replicated 
the single modifica- 
Presentations of the 
d tained the complete out- 
line of the object, So that the succes- 
differed only in the in- 
ternal details. The Tesult was that 
the recognition of the detail added to 
each exposure was greatly reduced for 
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the younger children, increasing from 
36% to 88% between the ages of four 
and eight years. Thus the presence 
of the completed outline of the object 
seemed in effect to interfere with the 
young child's perception of the parts 
of the figures. 

Meili (1931) has taken these find- 
ings, together with related evidence, 
as indicating the young child’s in- 
ability to attend to both the part and 
the whole at the same time. Which 
he will attend to will depend, accord- 
ing to this author, on the degree of 
structure present in the stimulus: 
for strong or simple wholes, the whole 
will be perceived, whereas for weak 
or complex wholes there will be a con- 
centration on detail, Although this 
formulation represents a promising 
advance over the overgeneralized 
statement that young children are 
unable to perceive details in a whole, 
it is obviously of limited usefulness, 
Pending an operational definition of 
strong vs. weak wholes and a deter- 
Mination of the 
variable with 
Conceivably th 
reduced to thos 
line and degr 


dancy—a conceptualization which 
would bring this Problem, together 
with that discussed in the previous 
subsection, into the analytic frame- 
work of information theory. 


Number 


The role of configurational factors 
in the identification of number, under 
exposure times too short to permit 
counting, is the subject of experi- 
ments by Freeman (1910) and Bie- 
müller (1930). Freeman showe 
groups of from 3 to 12 dots arranged 
in a variety of configurations, as well 
as in vertical and horizontal rows. In 
general, the accuracy of the children 
(6 to 14 years old) started to lag be- 
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hind that of adults for groups of dots 
greater than five, with the exception 
of complete patterns (e.g., a 3X3 
matrix to represent 9), which the 
children in particular found easier 
than corresponding incomplete pat- 
terns. There were, however, only 14 
Ss in the children’s group, thus ruling 
out the determination of progressive 
age changes. 

Biemiiller (1930) limited himself 
to linear patterns of from 3 to 10 
marbles exposed for ca. 1 sec. They 
were spaced either uniformly, or with 
gaps between some of the marbles, so 
as to break the row up in various 
ways, some rhythmical, some sym- 
metrical about the middle of the row 
and some irregular. The added re- 
dundancy in the rhythmical or sym- 
metrical patterns proved relatively 
ineffective in improving accuracy of 
the estimates below adolescence. 
There was, in fact, a marked tend- 
ency on the part of the younger 
children to ignore these groupings in 
their reproduction of the perceived 
patterns: rhythmical and especially 
symmetrical patterns were frequently 
turned into uniform, unpatterned 
rows. This study thus seems to indi- 
cate that the young child is to a de- 
gree handicapped in handling infor- 
mation with regard to patterning in 
discontinuous stimuli, which may be 
related to the relatively minor role of 
continuation at the younger age 
levels in Rush’s (1937) study. As 
for the apparently conflicting results 
of Freeman, it should be noted that 
they refer to patterns of matrices, in 
which the configurational element is 
presumably much stronger; further- 
more, the sampling is much less ade- 
quate and comprehensive than in 
Biemiiller’s experiment. 

Lastly we may cite two studies 
which will serve as a convenient 
bridge to the following sections on 
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time- 
Bee ae re Joan ey) 

2 ake rom two to seven 
stimuli in temporal succession, in in- 
tervals of one second, either fi the 
form of tones or of visual flashes. For 
both modalities there was a regular 
and very substantial improvement 
from the first to the sixth grades in 
the S’s accuracy of identification of 
the numbers. Interestingly enough, 
at all levels performance was poorer 
for the flashes than for the sounds; 
there was some indication, moreover, 
that this difference between modal- 
ities increased with age. " 

The second study, by Stambak 
(1951), although not dealing with 
number as such, is closely related to 
Biemiiller’s work on patterning, but 
over a temporal rather than a spatial 
dimension. In this experiment chil- 
dren between 8 and 15 years were 
asked to reproduce a series of 21 
heard rhythmic patterns, made up of 
three to eight elements in one to five 
subgroups. This kind of temporal 
integration again improved with age, 
but there was almost perfect con- 
sistency in the order of difficulty of 
the patterns at the different age 
levels. The major determinant was 
the total number of elements in the 
pattern, but for the larger numbers 
the number of subgroups was an im- 
portant contributing factor. 


Movement 

Apparent movement. Several studies 
on developmental changes in ap- 
parent movement have been reported 
(Brenner, 1957; Gantenbein, 1952; 
Meili & Tobler, 1931); all agree in 
finding a decrease in susceptibility 
to the perception of such movement 
with increasing age. The most thor- 
ough of the investigations on this 
problem is that by Gantenbein 
(1952), who has verified this age 
trend across a variety of conditions 


276 JOACHIM F. 


of stimulus exposure, systematically 
varying distance, temporal interval, 
exposure time and intensity, with 
fairly consistent results, i.e., an in- 
crease with age in the threshold be- 
tween succession and movement. 
The same applied to the threshold 
between movement and simultaneity, 
but here the age differences were 
much less pronounced. This was ap- 
parently due in large measure to the 
technical difficulty of producing an 
appearance of simultaneity in the 
adult (frequently no upper threshold 
could be determined). Indeed, in 
Brenner’s (1957) study, just the re- 
verse held true: age differences were 
greater for the movement-simultane- 
ity threshold, simultaneity being per- 
ceived at lower speeds by the younger 
children. 

Meili and Tobler (1931) attribute 
children’s greater Susceptibility to 
apparent movement to a lesser spa- 
tial stability of the stimulus in child- 
hood. They discount an explanation 
in terms of a greater tendency for 
fusion in children, showing that the 
fusion threshold for a Talbot disc in 
a five-year-old group is essentially 
equivalent to that of an adult group. 
While the number of Ss on which this 
comparison was based was rather 
small, several other experiments on 
the CFF confirm this finding (Hart- 
mann, 1934; Miller, 1942). 

Brenner (1957), on the 


1 basis of a 
somewhat mistaken interpretation of 
the argu 


iment proposed by Meili and 
Tobler, suggests that according to 
their explanation of the age changes 
in apparent movement this phe- 
nomenon should be Correlated with 
speed of closure. This she finds not 
to be so: there is, on the contrary, an 
inverse correlation between scores on 
a Street-Gestalt test and the thresh- 
old for apparent movement. But 
there seems to be little reason to ex- 
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pect a relationship between closure, 
in the recognition of meaningful pic- 
tures from partial cues, and the 
stimulus displacement involved in 
apparent movement. 

A phenomenon undoubtedly re- 
lated to apparent movement is the 
subject of a study by Piaget and 
Lambercier (1951b): a square rotates 
along an eccentric circular path; as 
the speed of rotation increases, the 
rotating square changes first to a 
stationary cross and subsequently to 
two superimposed crosses, an inner 
and an outer one. Here again, as age 
increases Progressively higher speeds 
are required for the perception of the 
single and double cross, 

Movement in complex patterns. The 
Perception of the direction of move- 
ment of a number of stimuli simul- 
taneously displaced forms the topic 
of an interesting experiment by Rey 
(1954), column of five dots is 
shown which can be displaced either 
to the right or left or remain station- 
ary, each independently of the others. 
Various patterns of movement can 
thus be produced, The S's task is to 
specify (from memory) which dots 
moved on any given trial, and in what 
direction. Where two or more stimuli 
moved in opposite directions, the cor- 
rect identification of the direction of 
movement turned out to be particu- 
larly difficult for the youngest chil- 
dren (ages four to six): apparently 
they are unable to coordinate the left- 
right dimension with the vertical po- 
sition of the dot. Furthermore, even 
if errors of direction are ignored, cor- 
rect identification of the dots dis- 
Placed on a given trial steadily de- 
creased as the number of dots in- 
creased, but to a much larger extent 
in young as compared with older chil- 
dren. 

Speed. In his unpublished thesis 
Edgren (1953) included tests of the 
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relative speed of two moving objects 
under a variety of conditions, as 
originally studied by J. F. Brown 
(1931) in adults. Age differences 
were, in general, either nonexistent 
or very irregular. The only sugges- 
tion of an age trend was observed ina 
rather difficult type of judgment in 
which the standard and variable were 
moving at right angles to each other. 

Introducing the dimension of 
meaning into their stimuli, Wapner 
and Werner (1957) found much more 
striking developmental changes in 
perception of relative speed. They 
prepared a set of pairs of outline 
drawings of familiar objects, one in 
each pair representing the object in a 
static position and the other showing 
it in a position suggesting motion. 
With both objects exposed simul- 
taneously on moving belts, the S had 
to adjust the speed of the latter (the 
variable) to that of the “static” ob- 
ject (the standard). For adults the 
PSE virtually coincided with the 
POE; not so for the children, who, to 
an extent progressively decreasing 
with age, overestimated the speed of 
the variable; thus for them the speed 
of movement was affected by the 
dynamic qualities suggested in the 
drawing. (In this experiment, as in 
many others, the possibility of a 
systematic bias due to the use of one 
type of stimulus as standard and the 
other as variable must not be over- 
looked; it is at least conceivable that 
Ss might tend to underestimate a 
constant as compared to a variable 
speed, and that this effect might in- 
teract with age. As noted previously, 
this whole problem of the “error of 
the standard” is badly in need of 
fuller and more systematic treat- 
ment.) 

Causality. Two developmental in- 
vestigations of the perception of cau- 
sality, based on the classical work of 


re 
Michofte (40 should be included 
an aspect e non F TRN 
Olum (1956) heet e 
Michotte’s conditions, varying the 
relative speeds of the two stimuli in- 
volved. Whereas most adults per- 
ceived „the relationship between the 
two stimuli either in terms of a re- 
leasing effect (for one combination 
of speeds) or of a pushing effect (for 
the other two), children’s responses 
were much more variable for any 

given situation. In particular, where 

the speed of the two stimuli was 

identical or closely similar, a number 

of children gave responses without 

any causal implications, such as pass- 

ing, crossing, and even “tunnelling.” 

It is of course difficult to decide 

whether the age changes observed 

here reflect perceptual differences or 

rather shifts in verbal habits in the 

description of a perceived relation- 

ship between two moving objects. 

One result appears, however, to have 

a well-founded perceptual basis: 

where the two speeds were in the ratio 

of 1:30, a large number of children 

perceived them as mutually ap- 

proaching—a finding which Olum 

quite plausibly attributes to the in- 
tervention of stroboscopic movement 
between the stimuli (which, as we 
have seen, occurs more readily in 
young children). 

Piaget and Lambercier (1958), in 
the most recent of the studies in the 
series from the Geneva laboratories, 
report on the same phenomenon; 
they likewise find noncausal types of 
responses in seven- to nine-year-old 
children, where the speeds of the two 
stimuli were in the ratio of 1 to 1. In 
fact, as late as 13 years, half of the 
Ss gave such responses. 

A variable more thoroughly ex- 
amined in this study is the spatio- 
temporal interval between the stim: 
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uli at the point where the second 
starts to move. The younger children 
perceive a causal relationship be- 
tween the stimuli only as long as they 
perceive a physical contact between 
them, while for the adults the causal 
effect can operate across a certain 
gap. On the other hand children per- 
ceive physical contact much more 
readily in the absence of actual con- 
tact, i.e., across much larger spatio- 
temporal intervals, It appears plaus- 
ible to relate this finding again to 
their greater Susceptibility to ap- 
Parent movement, both phenomena 
demonstrating the lesser spatial sta- 
bility of the stimulus for the younger 


child, as argued by Meili and Tobler 
(1931). 


Time 


Although the problem of time per- 
ception would seem to be intimately 


this rela- 


this subject, which has limited itself 
to judgments of either unfilled time 
intervals or of those filled by some 
static stimulus, 

Fraisse (1948) tested children at 
ages 6, 8 and 10, as well as an adult 
group, by means of a reproduction 
technique, with time intervals of 5; 
1, 5 and 20 sec., both filled and un- 
filled. € main age difference ap- 
peared in the judgments of the two 
longest empty intervals, which were 
grossly underestimated 


y the young- 
est group; at the same time their 
judgments for the 20-sec. interval 
were chara i 


time intervals, or with t 
intervals after the age of ei 
the basis of these results, Fraisse sug- 
gests a distinction between time per- 
ception and time estimation, the 
former applying only to very short 


e longer 
ght. 
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time intervals (e.g., 1 sec.). Thus this 
author sees the younger children as 
deficient primarily in their estima- 
tion of time. P 

The judgmental nature of the in- 
feriority of younger children is borne 
out to some extent by a recent ex- 
periment by Smythe and Goldstone 
(1957). They report a steady de- 
crease, between seven years and 
adolescence, in the error made in 
identifying a 1-sec. interval by a 
method of limits; however, following 
repeated exposure to this interval 
identified as one second, the children 
from eight years upward improved 
very markedly in these absolute-type 
judgments, so much so that _the 
former age differences were wiped 
out, 

Lastly an older study by Gilliland 
and Humphreys (1943) is of interest, 
since it compared judgments of time 
intervals from 9 to 180 sec., obtained 
by three different methods: estima- 
tion, production, 
in two age groups 
college adults). Al 


parently, for intervals of this length, 
the mediatio 
time scale is essential even in a com- 
parative judgment task; this would 
be in line with Fraisse’s point of the 
nature of the judgment for these 
longer intervals. As for the differ- 
ences between the two age groups in 
this study, the adults were more ac- 
curate throughout, but there was no 
interaction, either between age and 


time interval, or between age and 
method. 


Perceptual Learning 


Various studies Cited in this review 
ave given more or less incidental 
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evidence of practice effects; interest- 
ingly enough the extent of such ef- 
fects is in general approximately pro- 
portional to the level of initial per- 
formance, and thus increases with 
age. This trend is seen in several of 
the studies from the Geneva Labora- 
tories (Lambercier, 1946a, 1946b; 
Piaget & Denis-Prinzhorn, 1953; 
Piaget, Maire, & Privat, 1954) in 
which judgments for a baseline con- 
dition were made at the beginning 
and at the end of the experimental 
Session; the improvement usually re- 
sulting may of course be due either to 
practice per se or to the effect of the 
interpolated conditions. 

a The studies on cutaneous localiza- 
tion (Dunford, 1930; Renshaw, 1930; 
Renshaw & Wherry, 1931; Renshaw, 
Wherry, & Newlin, 1930) all included 
extensive practice sessions under a 
single experimental condition, and 
thus provide more systematic and 
unambiguous information on learn- 
ing effects in this type of task. Dun- 
ford found improvement with prac- 
tice again closely proportional to the 
initial performance at each age level; 
similarly in Renshaw’s (1930) study 
the children, who were initially supe- 
rior to the adults, likewise benefited 
from their practice to a greater ex- 


tent. 
We may note also a related ex- 


ploratory study by Gibson and Gib- 
son (1955) on the recognition of non- 
sense designs: over a series of trials, 
children between 84 and 11 years of 
age showed considerable improve- 
ment, while a six- to eight-year-old 
group gave very little evidence of 
learning. This difference seemed to 
be related to the Ss’ readiness to 
verbalize the main stimulus dimen- 
sions on which these complex stimuli 
Varied. 

Perhaps the most interesting as 
well as the most thorough of the 
studies on the effects of practice is 


tet by Noeltigg (in press) on the 
er-Lyer illusion, investigatin 
changes in this illusion over a ae 
of 20 trials. Noelting found a pro- 
gressive increase with age in such 
changes, which tended to counteract 
the illusion. The youngest grou 

ps 
(ages five to seven) actually showed 
either no improvement or an increase 
of the illusion with practice. A poly- 
nomial trend analysis of the error 
curves, furthermore, disclosed that 
there was curvilinearity at all age 
levels: the curves went through a 
minimum (or maximum), followed by 
a slight reversal. 

The interpretation of these age dif- 
ferences in perceptual learning must 
await a more adequate explanation of 
the nature of such learning effects, 
particularly since, occurring without 
benefit of external reinforcement, 
these cases of improvement with 
practice are not readily accounted for 
in associative terms (cf. Wohlwill, 
1958). One plausible hypothesis 
might be that in at least some in- 
stances these practice effects are at- 
tributable to adaptation processes 
cf. Köhler and Fishback’s (1950) in- 
terpretation of changes in the Müller- 
Lyer illusion, in terms of satiation. 
The available evidence on develop- 
mental changes in adaptation (Carini 
& Carini, 1958; Giannitrapani, 1958) 
indicates, however, rather a tendency 
for adaptation to decrease with age. 

Perceptual learning and perceptual 
development. A question relating to 
perceptual learning in a more general 
sense concerns its role in the develop- 
mental changes in perception which 
we have considered in this review. In 
what sense and to what extent are 
these changes interpretable as learn- 
ing effects? Unfortunately, few if 
any developmental studies have 
achieved the kind of control over Ss’ 
prior experience that would be re- 
quired for an answer to this question: 
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What would seem to be needed is an 
attempt to manipulate perceptual 
experience, both through prolonged 
specific practice (as in the studies by 
Welch [1939a, 1939b]) and through 
exposure to different types of stim- 
ulus environments. There are obvi- 
ous limits to the feasibility of this 
type of approach with children, but 
developmental psychology can surely 
profit here, as elsewhere, from work 
of this type carried out on subhuman 
species (e.g., Gibson & Walk, 1956). 
Even without such experimental 
control, effects of learning are clearly 
demonstrable in at least one develop- 
mental change: the improvement in 
the tachistoscopic recognition of 
words and objects (Edgren, 1953; 
Forgays, 1953; Hoffmann, 1927), 
ere at least two experential factors 
may be specified: increasing famili- 
arity with the Particular objects and 
words exposed, and, in the case of 
the studies on word recognition by 
Forgays and Hoffmann, the perfec- 
tion of the visuomotor habits in- 
volved in reading. The role of the 
latter factor is shown in the sharp 
rise in performance observed in both 
of these studies between the second 
and fourth grades of school, coincid- 
intensified instruction 
is further attested to 
development of selec- 
n for words exposed to 
the right of the fixation point, dem- 
onstrated in Forgay's experiment, It 
need hardly be p 


that the Seneralizability of specific 
learning effects i 


areas 
» Constancies) remains 
very much open to question, 


Discussion 


In the developmental 
brought out in this review 
discern, it seems, three dq 


changes 
one may 
Istinctive 
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trends which appear to cut across a 
variety of content areas. This con- 
cluding section will be devoted toa 
brief discussion of these trends, in 
the hope that some of the theoretical 
and experimental issues raised by 
this review will emerge in clearer 
focus. at 
Assimilation and contrast. The 
work on the illusions indicates that 
assimilation tendencies decrease with 
age, while contrast effects, at least of 
the temporal variety, increase. Fur- 
ther age changes which are probably 
relevant in this connection are the 
decreases with age in errors of antici- 
pation or starting-position effects, 
when the methods of limits or adjust- 
ment are used (e.g., Binet, 1895; 
Liebert & Rudel, 1959; Piaget & 
Lambercier, 1953; Wapner & Werner, 
1957), as well as the increase in com- 
pensatory tendencies such as those 
encountered in judgments of vertical- 
ity under conditions of body tilt. We 
will return to the significance of this 
trend for the general problem of per- 
ceptual development at the end of 
this section, 
The spatial framework. The work 
on the role of Spatial orientation in 
form discrimination gives clear evi- 
dence that the younger child gen- 
erally fails to relate a stimulus to the 
spatial framework in which it ap- 
pears. The sharp decrease with age in 
errors of spatial localization and in 
judgments of verticality (in the ab- 
sence of conflicting kinesthetic cues), 
horizontality, and the like point in 
the same direction, as do the changes 
with respect to certain illusions, nos 
tably increases up to a certain point 
for certain forms of the horizontal- 
vertical illusion and possibly for the 
Schumann square as well. The sim- 
ilar trends observed in the case of 
Projective-size judgments probably 
represent another aspect of the same 
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process. In most of the work on this 
question, however, the framework 
has been relatively remote (e.g., the 
walls of the room, the edges of the 
table) or altogether implicit; con- 
versely, in the rod-and-frame test of 
Witkin et al. (1954) involving a frame 
in close spatial proximity to the stim- 
ulus being judged, the infiuence of 
e framework is strongest at the 
younger age levels. The actual role 
of the spatial framework in percep- 
tual development needs thus to be 
clarified; one suggestion would be to 
pee stimuli within a near-by 
p mework which is itself in conflict 
ith the more remote but also more 
constant one defined, e.g, by the 

room, 
Poy role of redundancy in the stim- 
ag The age changes in the recogni- 
ae incomplete forms point to a 
T important dimension of de- 
the pg change. Compared, to 
Bee ult the young child requires 
Maine redundancy in a pattern to per- 
ay it correctly; thus both incom- 
ss oan very complex patterns will 
‘leer cult for him. By the same 
ii n, wherever a stimulus permits of 
og alternative modes of percep- 
hes that aspect involving the least 
Powe: of information will be the one 
ar ndeg to by the young child. 
ihe interpretation is consonant with 
d marked tendency of young chil- 
ren to follow along continuous linear 
on in their perception of shape 
ar in the embedded-figure task) 
ag to focus on the general outline 0 
fi gure rather than on internal detail. 
thei consistent, furthermore, with 
of f ow potency in early childhood 
s orm as a cue in matching and 
st hae equivalence experiments, 
aaa it is in competition with 
as ‘ally more redundant cues such 
a vs. solidity (Knoblauch, 
; Riissel, 1931) and color (Colby 


& Robertson, 1942). Lastly, consider 
a phenomenon of an entirely different 
order: the decreasing deen 
e Benapin ee ae 
child requires a again the younger 

q greater amount of 
surplus information, in the sense of 
the simultaneous presence of a va- 
riety of cues which are usually cor- 
related. 

The problem of veridicality. The 
work on assimilation and contrast 
effects point to a problem of con- 
siderable importance for theories of 
perceptual development: do changes 
in perception as a function of age or 
learning necessarily tend towards 
veridicality, as Gibson and Gibson 
(1955) have argued? Although start- 
ing from rather different premises 
from the Gibsons’, Piaget in his con- 
ception of perceptual development 
likewise emphasizes the increasing ac- 
curacy of perception, resulting from 


get’s studies 


parallel-line illusions bear out this 


view, demonstrating decreases with 
age both in assimilation and in con- 
trast. In Piaget’s view, the main 
distinction which needs to be made 
is between spatial and temporal 
phenomena, since in the case of the 
latter the regulatory mechanisms are 
not given an opportunity to inter- 
yene; thus the increase in successive 
contrast with age in the Usnadze 
illusion (Piaget & Lambercier, 1944). 

Operating within the framework 
of sensory-tonic theory (Werner & 
Wapner, 1952), on the other hand, 
Wapner and Werner (1957) have ar- 
rived at a formulation which largely 
eschews any reference to veridicality 
in the developmental trends which 
it hypothesizes; these are rather 
specified in terms of the changing 
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interplay of a system of forces which 
may operate as readily to increase 
error (as in the development of the 
compensatory mechanisms in judg- 
ments of verticality) as to reduce it 
(as in the decreasing effects of static- 
object stimulation, exemplified in the 
starting-position effects in the same 
situation). With respect to the illu- 
sions, Wapner and Werner similarly 
imply that assimilation effects de- 
crease with age, with a concomitant 
increase in the strength of contrast 
effects, as shown in their results for 
the Titchener-circles illusion. 

In the view of this writer, this 
problem represents one of the central 
unresolved questions in the area of 
perceptual development. Further 
and more thorough investigation of 
this question would provide us with 
the empirical basis for a more ade- 
quate conceptualization of the char- 
acter and direction of developmental 
changes, and would thus shed light on 
the relative merits of the models of 
perception proposed by Piaget and 
by Wapner and Werner. At the same 
time the clarification of age changes 
in assimilation and contrast should 
contribute to a fuller understanding 
of these perceptual processes and of 
their dimensionality—i.e., whether 
assimilation and contrast do in fact 
represent two sides of a single psy- 
chological continuum, as suggested 
in the work on anchoring effects 
(Sherif, Taub & Hovland, 1958) and 


on leveling and sharpening (Holz- 
man & Klein, 1954), 


AKISHIGE, YX: Experimentelle Untersuch- 
ungen über de Struktur des Wahrneh- 
mungsraumes, Part IJ, Mitt. jur.-liter. 
Fak. Kyushu-Univer., 1937, a ce 23. 
118. 4 

Ames, Louise B., LEARNED, JANET, Mert- 
RAUX, Ruts, & WALKER, R. Development 
of perception in the young child as observed 
in responses to the Rorschach test blots, 
J. genet. Psychol., 1953, 82, 183-204, 
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If we have tended here to empha- 
size the differences between the con- 
ceptions of perceptual development 
advanced by Piaget and his collab- 
orators on the one hand and by 
Wapner and Werner on the other, we 
should note, in conclusion, an in- 
teresting point of similarity in their 
respective modes of approach to the 
study of developmental changes 1n 
perception. Both of these theories, 
however different in the types of con- 
structs with which they deal, concern 
themselves in the first place with 
certain basic processes of perceptual 
judgment, as they operate in the 
adult. Developmental changes are 
incorporated into these systems in an 
integral manner, being conceptua- 
lized in terms which bear a direct 
relationship to the basic constructs 
of the respective theories. Such an 
approach appears well worth emulat- 
ing in this field; it would in fact go 
far towards breaking down the bar- 
tiers between experimental and dif- 
ferential psychology to which Cron- 
bach (1957) has pointed. In this con- 
nection we might reflect on the view 
expressed by Lewin, that “problems 
of individual differences, of age levels, 
and of general laws are closely inter- 
woven ... general laws and individ- 
ual differences are merely two as- 
pects of one problem; they are 
mutually dependent on each other 
and the study of one cannot proceed 
without the study of the other 
(Lewin, 1954, p. 921). 
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A NOTE ON PROJECTION 


PHILIP H. CHASE 
Veterans Administration Hospital, St. Cloud, Minnesota 


Murstein and Pryer’s article on the 
concept of projection (1959) surely 
reflects laudable motives, and their 
review of relevant research contains 
a number of astute observations. 
There appear to be, however, rather 
glaring faults in formulation, cate- 
gorization and definition. 

Let us agree that “if ‘projection’ 
Means everything it means nothing” 
(Murray, 1951, p. 13). It is then 
difficult to see, except by Murstein 
and Pryer's final definition of projec- 
tion (1959, p. 370), why they quoted 
Zilboorg’s quote from the Malleus 
Maleficarum (Murstein & Pryer, 
1959, p. 353). Hallucination, not 
Projection, is the term usually ap- 
plied when “devils stir up the inner 
perceptions” so “that they appear to 
be a new impression—from exterior 
things.” Quotes from Murphy and 
Sears (Murstein & Pryer, 1959, p. 
354) are interesting but will be seen 
to bear more relation to “New Look” 
perception than to projection. We 
shall also see that the admonition 
above had no apparent effect on their 
final definition of projection. 

One can take issue with the au- 
thors’ classification system on the 
following grounds. Unless we wish 
to emasculate the term “projection,” 
it would seem appropriate to use it 
only when the process referred to in- 
volves the attribution of internal 
characteristics to some external per- 
son or objects. Inaccurate extension 
of the term may well have played a 
major role in the confusion recog- 
nized, but, nevertheless, added to by 
Murstein and Pryer. 

Projection is defined as “the proc- 
ess of unwittingly attributing one’s 
own traits, attitudes or subjective 


processes to others” (English & Eng- 
lish, 1958). In relation to this defini- 
tion the Murstein and Pryer classifi- 
cation of “attributive projection” is 
clearly redundant and relatively use- 
less for purposes of classification. A 
more useful classification would re- 
sult from considering the presumed 
purposes of the mechanism. 

Two major categories are immedi- 
ately obvious. We might term one 
type defensive projection and the 
other predictive projection. In the 
case of defensive projection the mech- 
anism is seen to operate in defense of 
the ego. One’s own unacceptable or 
denied characteristics are attributed 
to another. Murstein and Pryer’s 
“classical projection” clearly falls 
into this category, and we may dis- 
pense with the rather loaded term 
“classical.” In the case of predictive 
projection a problem in forecasting or 
describing the behavior or charac- 
teristics of others is the task of the 
individual. In the face of insufficient 
evidence about the objects of his pre- 
dictions or forecasts, the S tends to 
attribute some of his own character- 
istics to the object. The comments 
cited by Murstein and Pryer to sup- 
port their category of “attributive 
projection” all involve the predictive 
or forecasting aspect as the primary 
purpose of the mechanism. 

One can see that Murstein and 
Pryer are on tenuous ground when in 
discussing their “attributive projec- 
tion” they attempt to dispense with 
both the unconscious and the self- 
concept. They state that this cate- 
gory is not concerned with either 
concept. It can be easily argued that 
personality projection of any kind 
cannot occur without a self-concept 
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to project from, and that the self- 


concept and projective Process may 


the time it occurs, is largely uncon- 
scious, 


the expense of regulation by objec- 
Is it unrea- 


urstein and 
tryer we would have to define all 


The logic that Prompted Murstein 
and Pryer to categorize the ration- 


rationality of his Projection and also 
in the service of the ego Tationalizes 
is Projection, this is not neces, 
a type of Projection, 
t was a Pleasure to have the re- 
search review Made available to the 
reader, and it was hoped that as the 


ever, they were unable to escape the 
consequences of their earlier think- 
ing. Their attempt to limit their defi- 
i “Perception or judgments 
having to do with the personality 

(Murstein & Pryer, 1959, p, 370) was 
laudable eyen though earlier they 
had dealt so cavalierly with the self- 
concept. This reader was totally un- 
prepared, however, for the statement 
that projection js “the manifestation 
of behavior by an individual which 
indicates some emotional value or 
need of the individual” (Murstein & 
Pryer, 1959, P. 370). To accept this 
definition would mean that, in spite 
of urray’s warning, we would have 
to include as Projection at least the 
following Short list: reaction forma- 
i i ion, repression, sub- 
ssion, compensation, 
intro- 
compul- 
with- 


Let us still agree with Murstein 
“projection” 


tion and 
achieved, L 
endeavors to achieve scientific status 
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IMPRINTING: 
EMPIRICAL BASIS AND THEORETICAL SIGNIFICANCE! 


HOWARD MOLTZ 
Brooklyn College 


Interest in the effects of early ex- 
perience upon the behavior of adult 
animals has increased significantly 
in recent years as evidenced by in- 
vestigations of problems ranging 
from the genesis of filial behavior in 
aves to the development of tactile dis- 
crimination in primates. In general, 
this increase in interest has been due 
to a growing awareness that analysis 
of the role of early experience is of 
fundamental importance in under- 
standing many aspects of adult be- 
havior. Witness the amount of re- 
search that has been concerned with 
the influence of neonatal sensory dep- 
rivation on functions such as form 
discriminations, interocular transfer, 
and problem solving capacity. Con- 
sider also the number of studies de- 
signed to investigate the relation be- 
tween early social experience and the 
sexual and parental behavior of the 
adult organism. 

Although these studies have called 
attention to the effects of variations 
in neonatal experience on the on- 
togeny of a wide variety of verte- 
brate response patterns, we still lack 
any clear understanding of the man- 
ner in which these effects are exerted 
(Beach & Jaynes, 1954), For ex- 
ample, which particular develop- 
mental processes are involved when 
the sensory environment of an ani- 
mal is restricted such that it subse- 
quently shows increased emotional 


1 Research presented in this paper was sup- 
Ported by Research Grants M-1732 and M- 
2417 from the National Institutes of Health, 
Public Health Service. The writer wishes to 
acknowledge with appreciation the many 
helpful suggestions offered by Evelyn Raskin. 


excitement (Melzack & Thompson 
1956) as well as decreased respon- 
siveness to pain? (Melzack & Scott, 
1957). That early experience is frie 
fluential in structuring adult be- 
havior can scarcely be doubted, but 
the means by which this influence 
operates is, at best, only incompletely 


understood. 
The results of a recent study by 


Fantz (1957) suggest that desire for 
increased understanding of the man- 
ner in which intraorganic processes 
and extrinsic stimulus conditions in- 
teract in ontogeny can, in some 
cases, be better served by employing 
species-specific responses as indica- 
tors of this interaction rather than 
behavior established through instru- 
mental conditioning. A species- 
specific pattern suitable in this re- 
spect would be one that occurs with 
predictable regularity very early in 
the life of the organism and which, 
although stereotyped in its expres- 
sion, is nevertheless liable to exten- 
sive modification by manipulation of 
the sensory environment. The phe- 
nomenon which Lorenz (1935) desig- 
nated as “‘imprinting’’ appears to 
meet these requirements. As to the 
importance of studying imprinting 
for understanding the effects of early 
experience on the organization of be- 
havior, Thorpe (1951) writes: “It 
needs, and would repay full and pre- 
cise experimental investigation al- 
most more than any other aspect of 
animal behaviour” (p. 256). 

It is the purpose of the present 
paper to attempt a critical evalua- 
tion of the research literature con- 
cerned with imprinting and, on the 
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basis of this evaluation, to suggest 
hypotheses for further empirical 


study. 


Some DEFINITIONS OF IMPRINTING 


In 1873 Spalding reported that in- 
cubator hatched chicks tended to 
follow persistently the first moving 
object to which they were exposed. 
Heinroth (1910) subsequently called 
attention to this phenomenon when 
he reported that graylag geese can 
be made to respond to humans in 
filial fashion in preference to adults 
of their own species if they are ex- 
posed to humans just after hatching. 

years later, Lorenz (1935), 
concerned with analyzing the func. 
tional of stimuli 


inanimate objects can acquire, in the 
absence of any conventional rein- 
forcing agent, the capacity to evoke 
certain aspects of behavior that are 
ordinarily directed toward members 


Since the conditions under which 
an object can acquire this capacity, 
and the characteristics of the be- 
havior thus evoked, were considered 
unique, Lorenz designated the proc- 


s involved by a 
special term—imprinting, 


sumed to be in 
physiological development. Thus if 
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a graylag gosling, for example, is to 
respond to a human (or any other 
relatively large moving object) in 
filial fashion, initial exposure to the 
human must occur within several 
hours after hatching. The effect of 
this exposure is at first shown by the 
fact that the gosling persistently fol- 
lows its “surrogate parent.” 

2. Once imprinting occurs its ef- 
fect is irreversible. That is, the stim- 
ulus to which the animal is exposed 
during the brief critical period hence- 
forth becomes either the preferred 
stimulus or the only stimulus toward 
which the following response will be 
directed. This preference, once it is 
established, remains extremely age 
Presumably throughout the life o 
the animal. Thus with reference to 
graylag geese, Lorenz (1957) states: 
“Once their instinctive social reac- 
tions are transposed to a human 
being, their behavior does not change 
in the least even if they are kept for 
years with other members of theif 
Own species and without human 
company” (p. 105). While this state- 
ment is perhaps too extreme to be 
taken as representative of Lorenz’ 
views on the question of stability, 
there is no doubt that he believed 
imprinting Produces modifications in 
i extraordinarily 


orenz pointed out that many 
aspects of adult behavior that are not 
functional in the neonatal period 
during which imprinting is operative 
will nonetheless be directed toward 
the “imprinted object” when func- 
tional status is subsequently achieved. 
The fact that exposure to an object 
early in ontogeny can influence, in sO 
specific a fashion, as yet undeveloped 
Tesponse patterns was considered 
characteristic of the imprinting proc- 
ess. Lorenz cited the case of adult 
shell Parakeets that directed theif 
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courtship activities toward humans 
in preference to available members of 
the species az a conseqnenee ofthe 
v imprinted” to humans. 
This aberrant sexual behavior was 
held to be the result of imprinting and 
not of conditioning since, as Lorenz 
(1955 maintained, ‘‘... you cannot 
condition any not-yet-functioning re- 
sponse in the ordinary way” p. 209). 
4. When a bird becomes “im- 
Printed” to a particular stimulus it 
will readily transfer its following re- 
sponse as well as other social re- 
sponses to all members of the class to 
which that stimulus belongs. In 
other words, imprinting was assumed 
to result in attachment not to the 
specific features of an object but to 
its general characteristics. Thus 
when a gosling, for example, is 
hatched under a muscovy duck, it 
subsequently follows not only its 
particular “foster parent” but all 
members of the species to which its 
parent” belongs. Lorenz conceived 
of imprinting (as it operates in na- 
ture) as a method of acquiring a 
consciousness of species”; a method 
that insures that social behavior will 
extend beyond the parent but not 
beyond the species. 
Many contemporary ethologists 
would decline to draw as sharp a line 
between imprinting and associative 
learning as Lorenz had done earlier. 
Indeed, in a recent publication Lo- 
renz (1955) maintained that im- 
printing and conditioning should per- 
haps be considered as continuous 
rather than as distinct processes. 
But while the attempt to distinguish 
in an absolute manner between 1m- 
printing and conditioning has been 
abandoned, there nevertheless re- 
mains an emphasis on the innate 
aspects of the imprinting Process. 
Thorpe (1956), for example, con- 
ceives of imprinting as functioning to 


complete for the animal “... the 
processes initiated by its inherited 
eym wien opcode 
time of hatch epee ae 

ling possess a neurosen- 
sory mechanism that is initially re- 
sponsive not only to stimuli pro- 
vided by the parent but to stimuli 
provided by a variety of moving ob- 
jects that are quite unlike the parent. 
Since, as Thorpe assumes, the ac- 
tivation of this mechanism functions 
to release the following response, its 
lack of selectivity renders the social 
behavior of the animal potentially 
susceptible to control by almost any 
relatively large object that possesses 
the quality of motion. It is obvious 
that this condition, if it were to con- 
tinue, could hardly prove biologically 
adaptive, so that early in ontogeny 
the animal must acquire a preference 
for a particular class of objects to- 
ward which to direct its social be- 
havior. Exposure to any moving ob- 
ject during the critical period serves 
to establish this preference by mak- 
ing the releasing mechanism more se- 
lective so that henceforth it will be 
activated only by that object or by 
objects of similar configuration. Fa- 
bricius (1951a) and Ramsay (1951) 
also consider imprinting to be the 
process that increases the selectivity 
of the releasing mechanism, insuring 
that the social behavior of the animal 
will be evoked by a relatively limited 
range of stimuli. 

Lorenz (1935) was the first to 
point out that there were marked 
differences among avian species with 
respect to their “imprintability” and 
was the first to ascribe these differ- 
ences to genetically determined vari- 
ations in the specificity of a hypo- 
thetical releasing mechanism. Lo- 
renz (1957) emphasized that some 
birds can be imprinted only to mem- 
bers of the species since their 
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f innate perceptory patterns are 
so highly differentiated” (p. 267) that 
the following response cannot be re- 
leased by any other object. Thus cur- 
lews and great godwits, for example, 
cannot be made to direct their social 
behavior to such biologically inap- 
Propriate objects as a human or a 
cardboard box. There are many 
other avian species, however (e.g., 
graylag goose, and mallard duck), 
which can be imprinted readily to 
Moving objects that differ greatly 
in size and shape. This is assumed 
to be due to the fact that in these 
species the structure of the releasing 
mechanism at the time of hatching is 
differentiated only to the extent that 
it can be activated by Moving objects 
and not by stationary objects, 

The conception of imprinting as a 
Process that functions to increase the 
Specificity of a hypothetical releas- 
ing mechanism Carries with it the re- 
sponsibility of specifying precisely 
the behavioral events to which im- 
printing is functionally related. The 
attempt to establish these relation- 
ships has involved the use of such 
disparate responses as following an 
object during the neonatal period and 
courting the same object in later life. 
Indeed, Thorpe has suggested that 
recognition of territory (Thorpe, 
quisition of a particular 
h association with mem- 
species (Thorpe, 1956) 
be considered as the re- 
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deal primarily with the following re- 
sponse and, in addition, will use the 
term imprinting to denote a particu- 
lar experimental operation and nota 
process or a mechanism. It is re- 
alized that by proceeding in this way 
the uniqueness and, indeed, the 
“flavor” of imprinting become some- 
what “attenuated” but (as will be 
shown) such attenuation is believed 
necessary in view of the paucity 0 
available empirical evidence, Thus, 
imprinting will be defined as the pro- 
cedure of visually Presenting to an 
animal a large moving object during 
the first several hours of its life 
under conditions that insure that the 
object is not associated with such 
conventional reinforcing agents as 
food and water. This procedure has 
been found to evoke a close following 
of the object in such precocial avian 
Species as ducks, geese, coots, moor- 
hens, and domestic fowl. However, 
imprinting as defined above, has not 

een unequivocally shown to induce 


following either in passerine birds or 
in mammals. 


Hediger (1950), 
Gray (1958) 
imprinting in buffaloes, 
sheep, and guinea Pigs, their findings 
require experimental confirmation 
under more carefully controlled con- 
ditions before they can be generally 
accepted. To broaden the definition 
of imprinting so that the present dis- 
cussion might encompass these spe- 
cies and perhaps, in addition, include 
a greater variety of behavior patterns 
is not likely to prove theoretically 
salutary. It need hardly be added 
that systematic significance is not 
necessarily precluded by declining to 
consider the entire spectrum of be- 
havior that has been assumed (but 
not demonstrated) to be susceptible 
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to modification by the imprinting 
procedure. 


IMPRINTING AND THE CRITICAL 
PERIOD 


One reason that imprinting has 
generated so much interest is that the 
effectiveness of the procedure appears 
to be restricted to a specific period of 
development. It will be recalled that 
Lorenz maintained that if an object 
Is ‘subsequently to elicit following, 
initial exposure to that object must 
occur within a relatively brief period 
after hatching; a period during which 
the bird was assumed to be in a criti- 
cal stage of physiological develop- 
ment. Recently, several investigators 
have attempted to obtain evidence 
regarding the range of the critical pe- 
riod. For example, Fabricius (1951b) 
reports that in tufted ducks following 
can be elicited only if initial exposure 
occurs before 12 hours of age, while 
in the mallard, readiness to follow a 
strange moving object was still ap- 
parent in some cases after exposure 
was delayed as long as 72 hours. 
Even in the mallard, however, the 
probability of the animal following a 
strange object was found to be in- 
versely related to the age at which 
initial exposure occurred. Alley and 
Boyd (1950) have shown that for the 
coot imprinting is most likely to 
prove effective if it is administered 
within the first eight hours from the 
occurrence of hatching. Ramsay and 
Hess (1954) found that for both mal- 
lard ducklings and Bantam chicks, 
13-16 hours after hatching appeared 
to constitute the critical period. 
They reported that as little as 10 min- 
utes of exposure to a male model de- 
coy during this time was sufficient to 
Produce a strong approach response 
in a subsequent choice test that in- 
volved the familiar model and an un- 
familiar female decoy. The incidence 
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of correct choices was found to de- 
crease sharply when initial exposure 
occurred either before 13 hours or 
after 16 hours. 

While there is undoubtedly a criti- 
cal or sensitive period during which 
exposure to a moving object is most 
effective with respect to inducing 
subsequent following, it is nonethe- 
less possible to elicit the response in 
some cases even if initial exposure is 
delayed long past the period that has 
been designated as “critical” for a 
given species. Steven (1955), for ex- 
ample, reports that a lesser white 
fronted goose that was captured wild 
when between one and two weeks of 
age soon developed a strong follow- 
ing response to man. Under con- 
trolled laboratory conditions, Moltz 
and Rosenblum (unpublished) found 
that of 16 Peking ducks kept in iso- 
lation for the first three days of life, 
five showed a strong following re- 
sponse after exposure on the fourth 
day to either a moving cardboard box 
or a model decoy. However, when 
initial exposure occurred during the 
critical period (i.e. g-10 hours after 
hatching) 14 of 16 birds displayed 
vigorous following. After establish- 
ing the fact that in domestic chicks 
the critical period appeared to be con- 
fined to within the first 24 hours after 
the occurrence of hatching, Jaynes 
(1957) was able to induce strong 
following of a cardboard cube in inex- 
perienced chicks as old as six days 
by exposing them to the cube for sev- 
eral consecutive hours. While the 
original report by Spalding on the 
following of domestic chicks contains 
only a few procedural details, it is 
apparent that he observed close fol- 
lowing in some birds on the third day 
of life after removing the opaque 
hoods which these animals had been 
wearing from the moment of hatch- 


ing. 
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Hinde, Thorpe, and Vince (1956) 
reported that as initial exposure to a 
stimulus object was delayed, moor- 
hens became progressively less likely 
to follow, while coots were not as 
markedly affected. Indeed, it was 
found possible to induce following in 
Some coots even after a delay of six 
days. These authors suggested that 
the difference between the two spe- 
cies with respect to the age at which 
following can be established is due to 
the fact that the moorhen, unlike the 
Coot, exhibits early in its develop- 
ment a tendency to flee from strange 
moving objects. The appearance of 
the flight response was considered 
responsible for limiting the critical 
period in moorhens to within several 

hours after hatching. Since the coot 
first manifests this response at a 
somewhat later stage of develop- 
ment, it shows greater flexibility with 
respect to the age at which a strange 
object can elicit following. Hinde 

(1955b) concluded that the critical 

period is not a Property of an in- 

ferred imprinting process but a conse- 
quence of the appearance of response 


tendencies incompatible with follow- 
ing, 


o in relative rather than 


T as rigi 
orenz implied, 
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organism. To be sure, even if the 
critical period were less flexible than 
it appears to be, this lack of flexibility 
would not, in itself, provide support 
for the argument that the critical pe- 
riod is the product of a genetically 
determined condition that arises in- 
dependently of early experience. An 
animal might be responsive to cer- 
tain extrinsic stimuli at one particu- 
lar stage of ontogeny and be totally 
unresponsive at both an earlier and 
later stage, not because of the unfold- 
ing of some “innate growth plan, 
but because of changes brought about 
through the Progressive interaction 
between the developing organism and 
its sensory environment (Schneirla, 
1956). It is in terms of this interac- 
tion that the attempt will be made 
subsequently to account for the criti- 
cal period and for the initial appear- 
ance of the following response, 


THE “IRREVERSIBILITY” OF 
THE FOLLOWING RESPONSE 


It will be recalled that Lorenz con- 


tended that imprinting produces ir- 
reversible Modifications in behavior 
and that this is one of its distinguish- 
ing features. The display of abnormal 
sexual fixations by adult birds which 
earlier in life had been imprinted to 
man, or to other “heterospecific par- 
ents,” has frequently been adduced in 
support of this contention. The cir- 
cumstances under which this evi- 
dence has been obtained, however, 
leave much to be desired with respect 
to the exercise of sufficient experi- 
mental control, Hinde (1955b), for 
example, states that “No really ade- 
quate experiments seem to have been 
done here, however, for in all cases 
known to the writer the birds were 
continually in the Presence of man 
throughout their Pre-adult life so that 
the attachment of the sexual re- 


sponse to man, though perhaps in- 
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fluenced indirectly by the early ex- 
perience, could have been due to a 
later learning process” (p. 21). With 
reference to these sexual fixations, 
Thorpe (1956) has also stated that 

+. since special precautions 
against reinforcement by subsequent 
conditioning were not taken, this 
must have played a large, perhaps 
very large, part in the later course 
and strength of the phenomenon” 
(p. 358). The only studies relevant to 
the problem of “irreversibility” that 
have been performed with reasonable 
Precision are those that have dealt 
with the following response. It is 
these studies to which attention will 
be directed. 

The statement that imprinting 
produces behavioral modifications 
that are irreversible can be inter- 
preted in two ways: (a) once expo- 
sure to a particular object induces 
following, a markedly dissimilar ob- 
ject cannot do so, and (b) following a 
familiar object will continue to occur 
without any significant decrement in 
response strength. The first inter- 
pa is concerned with the prob- 
em of stimulus generalization, while 
the second is concerned with the 
problem of response stability during 
Successive presentations of the same 
object. For convenience, they will be 
treated independently. 


Generalization 
# 


Only a few studies have been re- 
pored that are relevant to the proD- 
em of generalization and of these, 
None has attempted to vary system- 
atically the generalization stimuli. 

he major concern has been whether 
oe that have followed a particu- 
ar object will subsequently follow an 
unfamiliar object when the latter is 
Presented alone Or in combination 
with the familiar object. Since the 
relation between familiar and un- 
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familiar was not orde 

meaningful ue a ee 

aid Ri precise generalization 
n can be specified. 

Fabricius and Boyd (1952-1953) 
exposed one group of mallard duck- 
lings to a balloon and a second grou 
to a brown box within the mk 
24 hours from the occurrence of 
hatching. After several days, during 
which time repeated presentation of 
the familiar object occurred, the 
birds were exposed to the unfamiliar 
object either alone or in combination 
with the familiar one. Fabricius and 
Boyd reported that several birds fol- 
lowed the unfamiliar object, but in all 
cases the ‘‘response was less intense.” 
Unfortunately, no data were pre- 
sented which would permit one to de- 


termine the relation between initial 


response strength and the magnitude 
of the generalization decrement. 
Using coots and moorhens, Hinde, 
Thorpe, and Vince (1956) conducted 
a number of generalization tests, em- 
ploying objects as different from the 
training object as a box moving along 
a wire differs from a man walking. 
The experimental procedure involved 
presentation of the generalization ob- 
ject either immediately following 
jnitial exposure OF after a number of 
opportunities to respond to the train- 
ing object had been given. The re- 
sults showed that many birds fol- 
lowed the generalization model on 
the first occasion on which it was 
presented, although following was 
frequently hesitant and unstable. 
There is no doubt, however, that 
generalization occurred and that re- 
sponse to an unfamiliar model was a 
positive function of the strength 
with which the training model was 
followed. Hinde, Thorpe, and Vince 
concluded that imprinting does not 
“irreversibly modify” behavior, since 
following one object does not pre- 
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clude subsequent following of mark- 
edly dissimilar objects, 

Jaynes (1956) has Presented data 
on domestic chicks which agree in 
most respects with the data obtained 
from coots and moorhens. The pro- 
cedure Jaynes employed consisted of 
exposing newly hatched chicks to one 
of two dissimilar cardboard objects 
G.e., a green cube or a red cylinder) 
that moved irregularly about an al- 
ley. After following had been well 
established, the unfamiliar object 
In all cases a 
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=—.07). Jaynes also reported in- 
ability to predict magnitude of this 
decrement from knowledge of the 
strength with which the familiar ob- 
ject was Previously followed. It will 
be recalled that, in this respect, the 
results of Hinde, Thorpe, and Vince 
are contradictory. The fact that the 
generalization decrement exhibited 
by coots and moorhens covaried with 
initial response level, while that ex- 
hibited by Peking ducks and do- 
mestic chicks did not, cannot be ex- 
plained on the basis of available 
empirical evidence, 

It is obvious that a systematic in- 
vestigation of imprinting in relation 
to the generalization phenomenon is 
required. Future research might be 
directed toward trying to order stim- 
uli along some Meaningful contin- 
uum (such as size or shape) so that a 
More precise generalization function 
can be specified. In addition, the 
relation between variations in the 
imprinting Proced 
of exposure, 
lowing, etc.) 
ity of the bj 


Parameters, th 
used to test for 
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the contention that once exposure to 
a particular object induces following, 
subsequent following of dissimilar ob- 
jects is precluded. The second inter- 
pretation is that following a familiar 
object will continue to occur without 
any significant decrement in response 
strength. 

Before proceeding to discuss the 
problem of response stability, it 
should be recalled that one of the 
conditions specified in defining im- 
printing is that the object to which 
the bird is exposed initially is not to 
be associated with any conventional 
reinforcing agent. This condition 
should also be made to hold for all 
subsequent exposures. If it can be 
shown that following remains stable 
in the absence of any obvious reward, 
then the attempt to identify the 
factors involved in maintaining the 
response is likely to prove systemati- 
cally significant. On the other hand, 
if following results in the receipt of 
food or water (or if the object fol- 
lowed acquires secondary reward 
value through association with such 
reinforcing events) the problem of 
response stability becomes trivial. 
Consequently, the analysis below 
will be restricted to an examination 
of those studies concerned with de- 
termining the functional course of the 
following response under conditions 
that appear to involve neither pri- 
mary nor secondary reinforcement. 
However, it should be mentioned 
that in some cases it was difficult to 
determine the extent to which rein- 
forcement had been involved. Many 
“imprinting studies” have been car- 
ried out under seminatural condi- 
tions and reports of these studies of- 
ten do not supply sufficient detail 
concerning the method employed. 
Especially when the experimenter 
was the object followed (as in Lo- 
renz’ early work), it becomes pertinent 
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to inquire whether the animals were 
fed by the experimenter and, if so, the 
extent to which he became ‘‘con- 
taminated” with reinforcement. The 
studies that follow seem to be suffi- 
ciently free from such contamina- 
tion. 

Fabricius (1951a) reported that 
although he was able to establish a 
strong following response in tufted 
ducks, shovellers, and eiders, the fol- 
lowing response “gradually dimin- 
ished” beginning at about three 
weeks of age. Nice (1953) employed 
essentially the same procedure as 
Fabricius, and also found that in 
shovellers following began to wane 
at approximately the same age. Al- 
though in many respects Fabricius’ 
study represented a pioneer effort, 
it was performed under conditions 
that left much to be desired with re- 
gard to the degree to which the bird’s 
experience in the test situation was 
controlled. Under somewhat more 
adequately controlled test condi- 
tions, Fabricius and Boyd (1952- 
1953) found that mallard ducklings, 
within the first 10 days of age, ceased 
following a moving box to which they 
had previously shown strong Sats 
tachment.” Fabricius and Boyd con- 
tended that this rapid decrement in 
response strength wasin part a func- 
tion of the fact that the animals were 
housed collectively. Why this should 
interfere with following was not made 
clear. 

Following appears to be somewhat 
more stable in coots and moorhens, 
for Hinde, Thorpe, and Vince (1956) 
observed that a decrease in response 
strength first became evident after 
four to five weeks of testing in moor- 
hens and not until seven to eight 
weeks in coots. There was no doubt, 
however, that even the intense fol- 
lowing of the coot began to wane 
and that “by the end of the juvenile 
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phase was lost altogether” (Thorpe, 
» Pp. 363). 
ae and Rosenblum (1958a) 
attempted to determine the func- 
tional course of the following re- 
sponse in Peking ducks under labora- 
tory conditions that remained con- 
stant over successive test trials. Each 
bird was tested for 25 minutes per 
day for a period of 15 days. The ob- 
ject was a cardboard cube that de- 
scribed a fixed pattern of motion 
about an enclosed alley. The results 
indicated that following tended to 
reach a maximum by the third trial 
but shortly thereafter a Progressive 
decrement in the strength of the re- 
sponse occurred, Activity irrelevant 
to following began to appear during 
either the sixth or seventh trial and 
increased in frequency during each 


successive exposure, Toward the end 
of the test series, the Majority of birds 
Not only failed to follow but evinced 
no interest in the object as it passed 
ehavior is to 
e strong and 
g that was evi- 


the strength 
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However, since imprinting, uncon- 
taminated by reward learning, has not 
been shown to affect directly any 
other response system of the adult 


animal, this possibility remains unex- 
plored. 


ANXIETY AND THE STABILITY OF 
THE FOLLOWING RESPONSE 


The fact that a number of studies 
show that intense following exhibited 
during the early test trials begins to 
decrease, is puzzling in view of the 
conditions under which the response 
was evinced. The decrement is obvi- 
ously not due to the withdrawal of 
reinforcement since, at least in the 
obvious sense of the term, no rein- 
forcement was ever present. Nor 
does the decrement appear to be a 
function of any other change in ex- 
perimental conditions, In the study 
reported by Moltz and Rosenblum, 
(1958a) for example, the animals 
were maintained in individual cages 
from the time of hatching and 
throughout the experiment the only 
moving stimuli to which any animal 
was exposed were provided by its 
own body and by the test object. 
Furthermore, the conditions under 
which the study was performed made 
it possible to insure that the stimulus 
situation did not change over succes- 
Sive trials, Despite these precautions, 
however, the Strength of the follow- 
ing response decreased progressively 
during the last half of the test series. 
The mechanism mediating this decre- 
ment requires some discussion. 

Several investigators have com- 
mented that, beginning at approxi- 


mately 25 or 30 hours from the oc- 


currence of hatching, precocial birds 
will frequently 


exhibit “anxiety” or 
“fear” in response to unfamiliar as- 
pects of the environment. Ramsay 
and Hess (1954), for example, noted 
that “fear responses” were a char- 
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acteristic feature of mallard behav- 
ior and Fabricius (1951a) reported 
the same phenomenon in tufted 
ducks and eiders. Hinde, Thorpe, 
and Vince (1956) speak of the fleeing 
response in moorhens and Jaynes 
(1958a) found the same behavior in 
the domestic chick. The writer has 
also observed that the Peking duck 
responds in a similar manner to a 
wide variety of unfamiliar stimuli. 
It was noted, however, that this be- 
havior was not present while the bird 
was following. Indeed, when in close 
proximity to the familiar test object, 
even the sudden introduction of a 
loud sound would sometimes fail to 
elicit any sign of fear. The impression 
was gained that following served to 
decrease anxiety and that the re- 
sponse would continue to occur only 
as long as anxiety continued to be 
aroused by the test situation. 
Anxiety will be defined as an in- 
ternal emotional state of the organ- 
ism, this inferred state being indexe 
by such observable events as distress 
calls, startle behavior, defecation, 
etc. On the assumption that strength 
of following is some positive func- 
tion of this test-aroused anxiety, it 
appears reasonable to expect that any 
experimental procedure designed to 
reduce anxiety would reduce the 
strength of the following response. 
Previous research (Moltz, 1954) has 
indicated that forcing an animal to 
remain ina situation in which anxiety 
has developed will either decrease or 
eliminate that anxiety. Accordingly, 
we can now formulate the following 
specific hypothesis: exposing duck- 
lings to a test situation in the absence 
of the object to which a strong follow- 
ing response had been previously at- 
tached will subsequently produce a 
decrement in the strength of that re- 


sponse. 
Evidence supporting this hypothe- 


sis was recently reported by Moltz 
and Rosenblum (1956b). They ex- 
posed Peking ducks for 25 minutes 
per day for three days to a cardboard 
cube that moved about an enclosed 
alley. Those birds which showed evi- 
dence of strong following were re- 
tained for further study and assigned 
to either an experimental or a con- 
trol treatment. Beginning on Day 4, 
the experimental Ss were given daily 
1-hour habituation sessions, each ses- 
sion consisting of placing S in the al- 
ley in the absence of the object. At 
the conclusion of each session the 
object was returned to the alley and 
the bird was permitted to follow. 
The control Ss were treated in the 
same manner as the experimental Ss 
except that beginning on Day 4 the 
control Ss were placed in a discrim- 
inably different situation for one 
hour each day. The results indicated 
that the following exhibited by the 
experimental Ss decreased consider- 
ably more rapidly than the following 
exhibited by the control Ss. There 
was little doubt that opportunity to 
habituate to the experimental situa- 
tion resulted in a marked reduction 
in the strength of the following re- 
sponse. 
There 
general hy 


is another implication of the 
pothesis that strength of 
following is some positive function of 
the anxiety level present during the 
period of exposure to the test object. 
The implication is that an experi- 
mental procedure designed to in- 
crease anxiety during the test period 
should increase following. It ap- 
pears reasonable to assume that elec- 
tric shock, administered in the ab- 
sence of the test object but in the 
presence of stimuli previously asso- 
ciated with following, will produce 
greater anxiety arousal during subse- 
quent presentation of the object than 
if the shock had been administered in 
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the presence of discriminably dif- 
ferent stimuli. On this basis we can 
formulate a second hypothesis: the 
greater the similarity between cues 
present during shock and those pres- 
ent during exposure to a test object, 
the greater the strength at which the 
following of that object will be 
maintained. 

Evidence relevant to this hypothe- 
sis has been presented recently by 
Moltz, Rosenblum, and Halikas 
(1959). Peking ducks were again ex- 
posed to a cardboard cube for 25 min- 


regularly scheduled exposure of Day 
7, the object was removed from the 
igned to one 
Each 


rods for the 
Purpose of delivering shock. One 


half of the Ss was confined in the 
compartment when the latter was in 
the alley; the remaining one half was 
confined in the compartment when 
it was outside the alley. The Ss were 
further divided into two additional 
subgroups. 


series of brief but intense shocks dur- 
ing the peri 


quently emitted 
tress calls when th 


the object and would often become 
startled suddenly for no apparent 
reason. This startle behavior and the 
defecation that frequently accom- 
panied it were exhibited considerably 
less often by the other animals dur- 
ing their exposures to the test object. 
These results are clearly in accord 
with theoretical expectation, since 
they suggest that the anxiety level 
Present during the period of expo- 
sure to the test object is an important 
variable determining the strength 
at which following will be main- 
tained. r 

Of course, many other implications 
of the hypothesis that strength of 
following is related to anxiety need 
be examined before the parameters of 
this relationship become clear. For 
example, we would want to deter- 
mine whether strength of following 
is a linear function of anxiety or 
whether increasing the level of anxi- 
ety would, at some point, result in the 
disorganization of the response. We 
would also want to know the extent 
to which level of anxiety affects the 
shape of the generalization gradient. 
In addition, it should be noted that 
experiments designed to test the hy- 
pothesis have employed Peking ducks 
exclusively, so that a problem arises 
concerning the generality of the re- 
sults. Itis conceivable that there are 
differences among precocial avian 
species with regard to the role of anxi- 
ety in maintaining the strength of 
following, Despite the fact that the 
following behavior of one species ap- 
pears to resemble closely that of an- 
other (Collias & Collias, 1956), many 
interspecies Comparisons will have to 
be made before we can conclude that 
following is influenced by the same 
variables and seryes the same adap- 
tive function in all species in which it 
occurs. Interesting problems in the 
Phylogeny of behavior would be pre- 
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sented if it were found that closely 
related species differed markedly in 
the extent to which variations in 
stimulus conditions affected their 
tendency to maintain the response 
once it was established. 


Wuy FOLLOWING Occurs 


_ Lorenz’ early statement concern- 
ing the uniqueness of what he desig- 
nated as the imprinting process has 
been found to be largely unsupported 
by experimental evidence. We have 
seen that while the procedure of vis- 
ually presenting a moving object toa 
precocial bird of a certain age will al- 
most invariably induce following, 
there is no empirical justification for 
assuming that the procedure, as 
such, directly affects any other re- 
sponse system. We have also seen 
that the acquisition of the following 
response is neither rigidly restricted 
toa specific period in ontogeny nor 
irreversible once it is established. In 
addition, we have been able to pre- 
sent experimental evidence which 
suggests a positive relationship be- 
tween level of anxiety and the 
strength at which the following re- 
sponse is maintained. However, we 
have not as yet discussed what is cer- 
tainly the most intriguing aspect of 
the imprinting phenomenon: namely, 
why following occurs at all. 

Lorenz’ answer to this question 


involves a hypothetical releasing 
mechanism, th 


è initial selectivity of 
which is assumed to be genetically 
determined. However, as yet no 
empirical evidence has been pro- 
vided which, in the writer's opinion, 
supports the assumption that such a 
mechanism is involved in the de- 
velopment of the following response. 


Indeed, Schneirla maintains that, 
“claims for innate per- 
have not been 
te experimental 


in general, 
ceptual schemata 
validated in adequa 


investigations” (1957, p. 95). There- 
fore, disavowing the concept of the 
releasing mechanism, we turn to ex- 
amine those processes and events in 
ontogeny which appear likely to con- 
tribute to the development and or- 
ganization of the following response 

In order to provide a framework liar 
the present analysis, let us consider 
the conditions under which follow- 

ing has been studied in the writer's 

laboratory and the behavioral events 

most frequently associated with the 

occurrence of the response. 

The apparatus that has been em- 
ployed consisted of a well-lighted 
wooden alley approximately 10-feet 
long. A motor connected to a pulley 
system was used to drive the test ob- 
ject about the alley at a constant 
speed. A duckling between 8 and 10 
hours old when placed in this ap- 
paratus for the first test trial, usually 
exhibits very little locomotion. The 
animal frequently drowses and ap- 
pears to attend to the object only 
when it passes near him. During this 
time none of the conspicuous signs 
of anxiety is present. However, when 
the animal is reintroduced to the ap- 
paratus 24 hours later for the start 
of the second 25-minute test trial, it 
emits many distress calls and fre- 
quently runs about the alley for a 
brief period before beginning to fol- 
low. Once following begins, the 
strength of the response increases 
progressively until the duckling 
comes to devote the entire period to 
pursuing the test object. As long as 
it remains in close proximity to the 
object neither distress calls nor 
startle responses are usually emitted. 

What we have just described is the 
sequence of behavior most often ob- 
served to accompany the develop- 
ment of the following response. Varia- 
tions in this sequence have, © 
course, also been observed. For ex- 
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ample, some ducklings do not follow 
at all, showing a marked avoidance of 
the object that persists over several 
trials. Others follow sporadically and 
never appear to exhibit any signs of 
emotional arousal when placed in the 
alley. Some animals will follow dur- 
ing the last few minutes of the first 
trial and begin to follow immediately 
at the start of the second trial. Al- 
though these ducklings exhibit the 
Sequence of behavior most frequently 
observed to precede following, the se- 
quence occurs more rapidly. The ex- 
e variations are due 
Physiological and 
behavioral development at the time 
of hatching remains to be deter- 
i what is most needed 
are studies of the 
vity patterns of the 
relate to the early 
havior of the neon- 


5 haps increased under- 
standing of the variability involved 


in the onset and termination of the 
hatching process is what is required. 
Certainly, a more precise baseline for 
dating the critical period than that 
Provided by emergence from the egg 


tend to make for 


further empirical developments per- 
mit, 


Anxiety and the D 
lowing 


In addition to 


: A the Writer, several 
Investigators hay, 


© called attention 
to the fact that during a relatively 


brief period after the Occurrence of 
hatching Precocial birds do not ex. 
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hibit any of the conspicuous signs of 
emotionality that characterize their 
later behavior. During this period 
there is an almost complete absence 
of such response indices of anxiety as 
distress calls, avoidance reactions, 
defecation, etc. It will be recalled 
that it is precisely at this stage of be- 
havioral development (the critical 
period) that the bird is exposed ini- 
tially to a moving object. Now Men- 
ner (1938) has pointed out that the 
fluctuation of retinal illumination 
that a moving object normally pro- 
duces in the mammalian eye is mag- 
nified in the avian eye by its posses- 
sion of a highly plicated pecten that 
Projects into the vitreous humor 
from the Point of entrance of the 
optic nerve. The shadows cast on the 
retina by the plications of this pecten 


unction to enhance sensitivity to 
movement 


of retinal flicker (i.e 


an at any other 
the fact that the 
is sensitivity at a 
anxiety level is low 
cal period important 
with respect to the organization of 
the following response. Its impor- 
tance seems to be related entirely to 
the fact that it Provides the occasion 
for the Conjunction or association of 
a low anxiety drive and an attention- 
evoking object. If this is indeed the 
case, then it does not appear unrea- 
sonable to Suggest that as a conse- 
quence of this conjunction the object 
acquires the Capacity to elicit cer- 
tain autonomically controlled com- 
Ponents of the drive state, 

A low-anxiety State can be thought 
of as involving a Particular constella- 


stage in ontogeny, 
bird does Possess th 
time when its 
makes the criti 
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cenai visceral and cardiac activities, 
; fraction of which can become 
epei onr to previously neutral 

imuli. Itis assumed that during the 
critical period the (classical) condi- 
tioning of these implicit low-anxiety 
reactions occurs simply by virtue of 
their association with the object and 
does not in any way depend upon re- 
inforcement. However, as a result of 
this conditioning the object acquires 
the capacity to function as a rein- 
forcer, henceforth mediating new 
learning. Specifically, it is assumed 
that when anxiety is subsequently 
aroused, any response instrumental 
in bringing the animal into contact 
with the familiar object will be 
closely followed in time by anxiety 
reduction due to the previously ac- 
quired capacity of the object to elicit 
responses incompatible with anxiety. 
Vigorous pursuit of the object should 
then continue to occur until either 
the stimulus situation ceases to 
arouse anxiety or the reward value 
of the object is extinguished. The re- 
lation between anxiety level and the 
number of trials over which strong 
following is maintained has already 
been discussed. 

We have stressed the fact that dur- 
ing the period of initial exposure to 
the alley and to the test object the 
bird shows almost no signs of emo- 
tional arousal, but that during subse- 
quent exposures distress calls and 
startle responses become prominent 
features of its behavior. A problem 
arises as to why these anxiety re- 
sponses occur. 

It will be recalled that during the 
period of initial exposure the bird is 
most often observed to remain in one 
section of the alley and to drowse at 
frequent intervals. Relatively few 
features of the alley, including the 
moving test object, seem to com- 
mand the animal's attention. The ob- 


ject appears to dominate the visual 
environment, while many other stim- 
ulus aspects of the apparatus remain 

unexplored” and hence unfamiliar. — 
Twenty-four hours later, however, 
the bird’s ability to move about has 
improved considerably and at the 
start of the second trial it not only 
appears more alert but frequently 
runs about the alley. It does not 
seem unlikely that the diffuse emo- 
tional excitement that the bird ex- 
hibits at this time occurs primarily 
in response to the unfamiliar features 
of the alley environment with which 
it now comes into contact. 

Recent studies (Hebb, 1946; Jer- 
sild, 1954; McBride & Hebb, 1948; 
Melzack, 1952) of the genesis of emo- 
tional behavior in several vertebrate 
species have shown that strong fear 
can frequently be elicited by strange, 
but innocuous, visual stimuli in the 
absence of any specific avoidance 
conditioning. What was found neces- 
sary for the development of this fear 
response was a period of early sensory 
contact with the environment during 
which time the “familiar” apparently 
becomes established. Thus, Melzack 
states that emotional behavior 
“| may appear at any stage of the 
animal's life when the situation dif- 
fers greatly from any that the animal 
has already encountered” (1954, p. 
167). While the anxiety responses 
of the duckling occur very early in 
ontogeny, there is little doubt that 
they are frequently evoked by un- 
familiar visual stimuli. If the poor 
locomotor and attentive capacities 
that the bird displays during the first 
trial do, in fact, result in limited sens- 
ory contact with many aspects of the 
alley, then it appears reasonable to 
expect that anxiety will be aroused 
later when, by virtue of increased ac- 
tivity and general alertness, the bir 
subsequently comes into contact wit 
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these unfamiliar stimuli. This ex- 
pectation is supported by results 
(Hess, 1958) indicating that improve- 
ment in locomotor ability is closely 
accompanied by an increase in the 
frequency of fear responses during 
the first 36 hours from the occurrence 


interesting 
tualization 


conditioned t 
simple Sensory association” (i.e, 
Pavlovian or classical conditioning) 
through the contiguity of the CS and 
the fear reactions aroused by the on- 


set of a noxious stimulus, 
tion, they a 


EVIDENCE RELEVANT TO THE 
PRESENT ANALYSIS 


The value of any theoretical en- 
deavor should be measured not only 
in terms of its capacity to integrate 
available data but also in terms of the 
extent to which it can generate novel 
(i.e, not Previously formulated) 
functional relationships. It must be 
determined, therefore, whether the 
Present analysis in terms of anxiety 


reduction is capable of mediating pre- _ 


dictions concerning the influence of 
certain selected variations in the im- 
printing procedure on the acquisition 
and stability of the following re- 
sponse. We turn now to consider 
some of these predictions. 


Characteristics of the Object 


It has been noted that the struc- 
ture of the avian eye increases sensi- 
tivity to movement. We have sug- 
gested that this sensitivity combined 
with the limited locomotor and at- 
tentive capacities of the newly 
hatched bird probably causes the 
moving test object to dominate the 
visual field—at least during the first 
trial, However, within the frame- 
work of the present analysis, any 
repetitive visual stimulus that is 
likely to command the attention of 
the animal at a time when its anxiety 
level is low should acquire reward 
value and subsequently function to 
reinforce approach responses when 
anxiety is elicited. Indeed, even 
a prepotent auditory stimulus should 
function in this fashion. Evidence in 
support of the contention that follow- 
ing can be induced by stimuli other 
than that provided by a moving ob- 
ject has been reported recently by 
several investigators, 

James (1959) placed Plymouth 

ock chicks in an alley that con- 
tained a flashing light at one end and 


ry 


| 
| 
| 
| 
| 
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a continuous light at the opposit 
end. The chicks soon aeti 4 
strong preference for the intermit- 
tent light as evidenced by the greater 
a of approach responses and 

y the increasing tendency to re- 
main in the vicinity of the light once 
it had been approached. Parentheti- 
cally, it might be added that James 
reported the frequency of distress 
calls decreased markedly when the 
bird was in the vicinity of the inter- 
mittent light. It is also interesting to 
note that an object consistently asso- 
ciated with the intermittent light 
subsequently induced following when 
it moved about the alley. The same 
object after being paired with the 
constant light was either ignored or 
avoided. 

With respect to auditory stimuli, 
Collias and Collias (1956) found that 
young ducklings of several species 
will develop a strong tendency to 
move in the direction of a repetitive 
low-pitched sound even when the 
source of the sound is not visible. 
Fabricius and Boyd (1952-1953) also 
reported that rhythmic repetition of 
a brief sound will elicit strong follow- 
ing whether or not it is associated 
with a moving object. Indeed, some 
of the ducklings responded more 
strongly to the retreating sound than 
to the moving object. However, 
whether a visual or auditory stimulus 
was employed, following occurred 


only after that stimulus had been 
presented “over a period of varying 
length.” 


sted (Ramsay, 


It has been sugse S 
a wide variety 


1951) that although 
of inanimate objects can induce fol- 
lowing, some are perhaps more ef- 
fective than others because they bear 
a closer resemblance to stimuli nor- 
mally provided by the biological 
parent. For example, it might be ex- 
pected that a model decoy of a ma- 
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ture hen would elicit stronger follow- 
ing in a naive bird than a cardboard 
cube, or that the simulated quackin: 

of an adult duck might serve Benes 
than the rhythmic repetition of a 
nonsense syllable. Such a suggestion 
implies the existence of a central 
neural mechanism that is genetically 
patterned so as to correspond to cer- 
tain stimulus configurations more 
closely than to others. It is further 
implied that the closer the cor- 
respondence between extrinsic stim- 
ulus and neural structure the more 
easily the following response can be 

released. 

While we have found no evidence 
to indicate that innate perceptual 
schemata of any kind are involved in 
the development of the following 
response, the question raised above 
concerning differential stimulus ef- 
fectiveness should be examined. Con- 
sider the visual modality, and recall 
that we have already treated the 
problem of moving as compared with 
stationary stimuli as it relates to the 
organization of following. While 
there has been no attempt to rank 
order moving objects with respect to 
the degree to which they resemble the 
conspecific parent and then to test 
for effectiveness in inducing follow- 
ing, the limited data that are avail- 
able indicate that there is no positive 
relationship. Hinde, Thorpe, and 
Vince (1956), for example, found 
that a black box was somewhat more 
strongly followed than a moorhen 
model by both moorhens and coots. 
However, when all the objects em- 
ployed were considered, they con- 
cluded that there was“... no indi- 
cation that any particular character- 
istic, other than movement, was o 
especial importance” (p. 224). 
Fabricius also reports that shape and, 
within wide limits, size are unrelate 
either to the occurrence oF to the 
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stability of following. To emphasize 
the point he exposed ducklings to the 
“|. . gliding forward movement of a 
creature with exactly the shape of a 
swimming female duck” (1951a, p. 
276). Not only did this object fail to 
induce “stronger following as com- 
pared with other objects, but it was 
somewhat inferior to the human ob- 
server. 
In the writer’s laboratory, duck- 
lings were exposed either to a green 
ox or to a model decoy of an adult 
Peking. It was found, contrary to 
the data reported above, that the de- 
coy elicited Stronger following than 
the box (P=.08; two-tailed test). 
While a subsequent replication failed 
to confirm this finding, so that it ap- 
pears to have been the result of sam- 
pling error, what theoretical implica- 
tions would such a difference have 
entailed had it been confirmed? For 
example, would-it have indicated, to 
use Ramsay’s worde, van innate 
ability to respond to the biologically 
correct object”? (1951, p. 12). Per- 
haps the question should be made 
even more general: does consistently 
stronger following (by naive birds) 
of one object as compared with an- 


other, irrespective of whether that 
object is “bio 


it were to be found, 


fruitfully analyzed in terms of the ex- 
citation of 

bird’s eye t} 
tion 


For example, subse- 
t l readiness to follow 
certain moving objects could be a 
function of the greater amount of 
retinal flicker Produced by these ob- 


jects during the Critical Period by 


virtue of their size, shape, and rate of 
movement. It is not unreasonable to 
expect amount of retinal flicker, or 
similar receptor-excitation effects, to 
be related to the extent to which the 
animal orients toward a particular 
object and hence to the probability of 
that object dominating its visual en- 
vironment. On the basis of the pres- 
ent analysis, we would expect the 
most salient object during the criti- 
cal period to have the greatest likeli- 
hood of acquiring anxiety-reducing 
value and of being subsequently fol- 
lowed. In this way differences among 
objects with respect to their general 
stimulative effects might become 
“translated” into preferences that 
the animal exhibits, making it in- 
deed gratuitous to speak of some pre- 


formed “conception” of the preferred 
object, 


Imprinting Effectiveness and the Oc- 
currence of Following 


We have Conceived the initiation 
of following as being governed by 
two processes, The first Process con- 
sists of the association or conjunction 


of an attention-evoking stimulus and 
a low anxiety drive 


of which certain reacti 
of the drive state 


© processes that we 
9, in fact, govern the 

following, then it 
should be Possible to manipulate the 
first process independently of the 
second. In other Words, we would ex- 
Pect the object to acquire reward 
value independently of the occur- 


development of 
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rence of the following response. In- 
deed, it should be possible to delay 
opportunity to follow until well past 
the critical period, as long as initial 
exposure occurs during the appropri- 
ate time. It is predicted that when 
following is then permitted, the ani- 
mal will respond as strongly to the 
object as it does when it is allowed to 
govern its own behavior. 

There are several studies which ap- 
pear to support this expectation. For 
example, Moltz, Rosenblum, and 
Stettner (unpublished) placed Pek- 
ing ducks in an apparatus designed 
specifically to restrict movement 
while permitting an unobstructed 
view of an object that traveled about 
a small area. The Ss were exposed 
to the object in this fashion for 25 
minutes per day for three days; ini- 
tial exposure having occurred at ap- 
proximately 10 hours of age. When 
subsequently introduced into the 
alley and allowed to follow this ob- 
ject, they responded in a manner in- 
distinguishable from control Ss that 
had been actively pursuing the object 
for an: identical period of time. 

James? used essentially the same 
procedure with Plymouth Rock 
chicks as Moltz et al. had used with 
ducklings except that a flashing light 
replaced the moving object. When 
subsequently permitted to respond, 
these Ss approached the light as fre- 
quently and as vigorously as control 
Ss whose activity had not been previ- 
ously restricted. 

Jaynes (1958a) also presented data 
that are in accord with the assump- 
tion that the acquisition of reward 
value by an object is not dependent 
upon the occurrence of overt follow- 
ing. He exposed chicks of various 
ages to a moving cardboard cube for 
one 30 minute period. The object 


2 H, James. Personal communication, 1958. 


was then removed for 10 days after 
which time S was again placed in the 
apparatus and allowed to follow. Our 
interest is directed toward those ani- 
mals that showed no “overt” signs of 
following during initial exposure. It 
was found that these Ss responded 
somewhat more strongly than Ss 
that had previously shown some 
tendency to follow the object. Jaynes 

used the term “latent imprinting” to 

emphasize the fact that the effects of 

exposure to a moving object might 

not become manifest in behavior 

until several days later. 

A study reported recently by Hess 
(1957) contains data which do not ap- 
pear to be in accord with theoretical 
expectation, since they indicate a re- 
lationship between energy expendi- 
ture and imprinting effectiveness. 
Hess exposed mallard ducklings to a 
model decoy that moved about a cir- 
cular alley. The distance the decoy 
traveled, and consequently the en- 
ergy expended by those ducklings 
that followed closely, was varied 
while duration of exposure was held 
t. A choice test was then em- 
ployed involving the familiar model 
(now stationary) and an unfamiliar 
model of a different color (either 
moving or stationary). It was found 
that increasing the distance over 
which the duckling was required to 
follow functioned to increase percent- 
age choice of the familiar model. 

Since this finding has been consid- 
ered to reveal a unique feature of the 
imprinting process (Thorpe, 1956; 
Verplanck, 1958) we must carefully 
examine its relation to the present 
hypothesis. First, percentage choice 
was the response measure that Hess 
employed, while our hypothesis has 
been formulated primarily in terms 
of strength of following. The reason 
for restricting the hypothesis in this 
way is that percentage choice and 


constan 
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Strength of following have been 
found to provide different estimates 
of imprinting effectiveness, Jaynes 
(1958b), for example, reported that 
chicks which had strongly followed a 
i subse- 
quently failed to choose consistently 
between it and a strange object when 
both were Stationary, 
and the writer 


technique that does not require any 
change in test Conditions, 


and imprint; selina, 

Printing e ive- 
ness, On the basis of slat ee 
been Said regardin © generaliza- 
tion decrement involved in the use of 
the choice technique, it would appear 
reasonable to expect a Procedure that 
allows the object to retain the same 
speed and pattern of motion through- 
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out the experiment to be superior, or 
at least equal, to the choice technique 
in reflecting differences in imprinting 
effectiveness, Certainly, if Hess’ re- 
sults point to a fundamental feature 
of imprinting, it should be revealed 
quite clearly by using strength of fol- 
lowing as the response measure. 

Owever, when the following re- 
sponse was employed (without in- 
volving any change in the stimulus 
characteristics of the test object) no 
relation between energy expenditure 
and imprinting effectiveness was 
found. As was noted above, birds ex- 
Posed to, but not Permitted to follow, 


rtunity was 
these animals 


It appears reason- 
to Maintain, therefore, that 
Statement that 
- +. the Strength of imprinting 
“quals the logarithm of the effort ex- 
Pended by the animal during the im- 


ess’ 
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Printing period” (p. 85) requires ad- 
itional empirical support before it 
can be generally accepted. 


Anxiety Level and the Critical Period 


The results of several studies indi- 
cated that there is a period in on- 
togeny during which initial exposure 
to a moving object is most likely to 
induce subsequent following. We 
have maintained that the importance 
of this period with respect to the 
organization of the following re- 
sponse is a function, not of some spe- 
cifically preformed “growth plan,” 
but of the presence of a low-anxiety 
drive that provides the occasion for 
the conditioning of certain associated 
responses. The fact that this drive 
state is most often prepotent during 
a relatively brief stage of behavioral 
development largely limits the effec- 
tiveness of initial exposure to the 
first 15 or 20 hours from the occur- 
rence of hatching. The present an- 
alysis suggests, however, that if the 
sequence of events assumed to be in- 
volved in the organization of the fol- 
lowing response were to reoccur at a 
later stage of development, then it 
should be possible to establish the 
response in an animal to which the 
imprinting procedure had not been 
previously administered. In other 
words, it is predicted that irrespec- 
tive of age, the simultaneous occur- 
rence of a low-anxiety drive and a 
visually dominant (but unfamiliar) 
object, will result in vigorous follow- 
ing if anxiety is subsequently aroused 
in the presence of that object either 
through the introduction of shock or 
through the introduction of novel 
(but innocuous) visual stimuli. 

There are, of course, many prob- 
lems involved in testing this hypothe- 
sis; the major problem being that of 
drive manipulation. Recall that a 
naive duckling older than 20 or 25 


Bope will ees sor marked avoid- 
ce of a completely strange obj 
making it difficult to E RTE 
initial phase of the assumed drive 
sequence in the presence of that ob- 
ject. Of course, allowing sufficient 
time for habituation to occur might 
result in the desired drive level, the 
response components of which could 
then become conditioned to the ob- 
ject. Subsequent exposure, but in an 
unfamiliar stimulus situation, (which 
would presumably result in anxiety 
arousal) might then induce strong 
following due to the previously ac- 
quired anxiety-reducing value of the 
object. However, if exposure initially 
elicits fear, an opportunity is pro- 
vided for components of the fear re- 
action to become conditioned to the 
object, this event being incompatible 
with the drive sequence assumed nec- 
essary for the occurrence of following. 
Gray (1958) has demonstrated that 
such conditioning can occur quite 
rapidly and often remains evident for 
some time. Thus, although data have 
been obtained (Jaynes, 1957) which . 
indicate that prolonged habituation 
will frequently result in a “high level 
of following,” a procedure that would 
eliminate, or at least reduce, the fear 
manifested initially by juvenile and 
adult birds in response to a strange 
object would certainly be preferable. 
In short, in order to test the hy- 
pothesis under consideration we re- 
quire a method that provides a more 
“direct” way of reestablishing the 
low-anxiety state that was present 
during the critical period. 

A promising lead in this respect 
has been offered recently by Hess 
(1957) in connection with a study in- 
volving the use of such “tranquiliz- 
ers” as chlopromazine and mepro- 
bamate. He reported that these 
drugs produced a marked reduction 
of emotionality when administere 
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to mallard ducklings ranging in age 
from 24 to 32 hours. Of interest is 
the fact that the use of chlopromazine 
served to increase imprinting effec- 
tiveness at an age when the imprint- 
ing procedure js largely ineffective, 
It was found that if naive animals, 
chronologically Past the critical pe- 
riod, were exposed to an unfamiliar 
object while under the influence of 
chlopromazine, they subsequently 
(after the drug had worn off) re- 
sponded more frequently to that ob- 
ject than Control animals of the same 
age to which either water or nem- 
butal had been administered, Unfor- 
tunately, while this result js in ac- 
cord with the Present analysis, the 
fact that meprobamate did not prove 
effective when employed under the 
Same conditions, renders interpreta- 
tion difficult, Perhaps the contra- 


nt of the following 


spective of age, exposure to an un- 
familiar test object while under the 
influence of such agents (especially 
those that function as “autonomic 
suppressants”) will result in strong 
following if the animal is subse- 
quently subjected to experimental 
conditions calculated to arouse anxi- 
ety. 

The analysis that has been pre- 
sented in the present Paper has been 
concerned with the manner in which 
anxiety appears to be related to the 


data, obtained through systematic 
Study of a variety of avian species, 
are required before the parameters of 
this relationship can be specified pre- 
Further experimental work 
flort might also be 
etermining whether 


p eavor 
response, It 1S predicted that, irre- 
REFERENCES 
tne $ Boyp, H Parent-young recog- mechanisms of family integration in ducks. 
mace ie s, Auk, 1956, 73, 378-400. 


9-263. 
e chemistry and mode of 


ilizing drugs. Ann. NY 


Acad. Sci, 1957, 67, 685-699 


Corrias, N. E. The development of social be- 
havior in birds, Auk, 1952, 69 127-159 
Cortas, N.E: The analysis of soci i 
rot ees and goats, Ecology, 1956, 37, 228- 


Corrias, N, E, & Corrias, Ers, C 


FaBrICIUS, E, & 


nica, 1951, 68, 1-175. (a) 


Orn. Congr., 1951, 375-379. (b) 


the following reactions of ducklings. Wild- 
fowl Trust Ann, 

ANTZ, R. L. “Form preferences in newly 
hatched chicks, J, comp. physiol. Psychol., 
1957, 50, 422-430. 

RAY, P, H, Theory and evidence of imprint- 


Ay 


f, n OO eee 
s =- Á— 
Á 


y= 


O_o O 
— ee -S,rt—(‘“‘“OD 


IMPRINTING 313 


ing in human infants. J. : 

46, 155-166. J. Psychol., 1958, 

Hess, D. O. On the nature of fear. Psychol. 
Rev., 1946, 53, 250-275. 

Hepicer, H. Wild animals in captivity. 
London: Butterworth, 1950. 

Hernrotn, O. Beitrage zur Biologie, nament- 
lich Ethologie und Physiologie der Anati- 
den. Verhandl. 5th Int. Ornithol. Kongr., 
1910, 589-702. 

Hess, E. H. Effects of meprobamate on im- 
printing in waterfowl. Ann. NY Acad. 
Sci., 1957, 67, 724-732. 

Hess, E. H. Evidences for, and theories of, 
imprinting. Paper presented at American 
Association for the Advancement of Sci- 
a Washington, D. C. December, 1958. 
a, 

Hess, E. H. “Imprinting” in animals. Scient. 
Amer. March 1958, 81-90. (b) 

Hinve, R. A. The following response of 
moorhens and coots. Brit. J. anim. Behav., 
1955, 3, 121. (Abstract) (a) 

Hinpe, R. A. The modifiability of instinctive 
behaviour. Advanc. Sci., Lond., 1955, 12, 
19-24, (b) 

Hinne, R. A., THORPE, W. H., & VINCE, 
M. A. The following response of young 
coots and moorhens. Behaviour, 1956, 9, 
214-242. 

James, H. Flicker: An unconditioned stimu- 
lus for imprinting. Canad. J. Psychol., 
1959, 13, 59-67. 

Jaynes, J. Imprinting: 
learned and innate behavior: I. Develop- 
ment and generalization. J. comp. physiol. 
Psychol., 1956, 49, 201-206. 


aynes, J. Imprinting: The interaction of 
J learne! l : II. The crit- 


d and innate behavior: 
ical period. J. comp. physiol. 


The interaction of 


. Psychol. 1957, 


50, 6-10. . i : 
Jaynes, J. Imprinting: The interaction of 
learned and innate behavior: III, Practice 


effects on performance, retention, and fear. 
J. comp. physiol. Psychol., 1958, 51, 234- 
237. (a) 


Jaynes, J. Imprinting: The interaction of 


learned and innate behavior: IV. General- 
jzation and emergent discrimination. J. 
comp. physiol. Psychol., 1958, 51, 238-242. 
b 
J 12 A.T. Child psychology. New Jersey: 
Prentice Hall, 1954. 
Lorenz, K. Der Kumpa: 
Vogels. J. Orn. Lpz., 


289-413. £ 
Lorenz, K. The companion in the bird's 


world. Auk, 1937, 54, 245-213. 
Lorenz, K. Morphology and behavior pat- 
terns in closely allied species. In B. Schaff- 


nin der Umvelt des 
1935, 83, 137-213; 


ner, (Ed.), Group processes. New York: 
Josiah Macy Found., 1955. Pp. 168-218. 

Lorenz, K. Companionship in bird life. In 
Claire H. Schiller (Ed.), Instinctive be- 
havior: The development of a modern concept. 
New York: Int. Univer. Press, 1957. ‘ 

MCBRIDE, A. F., & Hess, D. O. Behavior of 
the captive bottle-nose dolphin, Tursiops 
truncatus. J. comp. physiol. Psychol., 1948, 
41, 111-123. a 

MeLzackK, R. Irrational fears in the dog. 
Canad. J. Psychol., 1952, 6, 141-147. 

Metzack, R. The genesis of emotional be- 
havior: An experimental study of the dog. 
J. comp. physiol. Psychol., 1954, 47, 166-168. 

Mexzack, R. & Scorr, T. H. The effects of 
early experience on the response to pain. J. 
comp. physiol. Psychol., 1957, 50, 155-161. 

MELZACK, R., & THOMPSON, W. R. Effects of 
early experience on social behaviour. Canad. 
J. Psychol., 1956, 10, 82-90. 

MENNER, E. Die Bedeutung des Pecten im 
Auge des Vogels fur die Wahrnehmung von 
Bewegungen, nebst Bemerkungen uber 
seine Ontogenie und Histologie. Zool. 
Jo., Abt. allg. Zool. Physiol, 1938, 48, 
481-583. 3 

Mortz, H. Resistance to extinction as a func- 
tion of variations in stimuli associated with 
shock. J. exp. Psychol., 1954, 47, 418-424. 

Mort, H., & Rosensium, L. A. Imprinting 
and associative learning: The stability of 
the following response in Peking ducks 
(Anas platyrhynchous). J. comp. physiol. 
Psychol., 1958, 51, 580-583. (a) 

Motz, H., & ROSENBLUM, L.A. The relation 
between habituation and the stability of 
the following response. J. comp. physiol. 
Psychol., 1958, 51, 658-661. (b) 

Motz, H., ROSENBLUM, L. A., & HALIKAS, 
Nina. Imprinting and level of anxiety. J. 
comp. physiol. Psychol., 1959, 52, 240-244. 

MOWRER, O. H. Learning theory: Historical 
review and re-interpretation, Harvard 
educ. Rev., 1954, 24, 31-58. 

Nice, MARGARET. Some experiences in im- 
printing ducklings. Condor, 1953, 55, 33-37. 

Pumrurey, R. J. The sense organs of birds. 
Ibis, 1948, 90, 171-199. 

Ramsay, A. O. Familial recognition in do- 
mestic birds. Auk, 1951, 68, 1-16. 

Ramsay, A. O., & Hess, E. H. A laboratory 
approach to the study of imprinting. Wilson 
Bull., 1954, 66, 196-206. 

ScuLosBERrG, H. The relationship bet 
success and the laws of conditioning. 

chol. Rev., 1937, 44, 379-394. 

Scunetria, T. C. Interrelationships of the 
innate’ and the ‘acquired’ in instinctive 
behavior. In L'Instinct dans le comporte- 


ween 
Psy- 


314 HOWARD MOLTZ 


l 
ment des animaux et de Vhomme. Paris: 


STEVEN, D. M. Transference of “imprinting” 
Masson et Cie, 1956, Pp. 387-452. in a wild gosling. Brit. J. anim. Behav., , 
ScaxemrLA, T. C. The concept of develop- 1955, 1, 14-16. j ae 
ment in comparative Psychology. In Dale THORPE, W.H. The evolutionary sginean 
B. Harris (Ed.), The concept of development: of habitat selection, J, anim, Ecol., 1945, 
n issue in the study of human behavior. 14, 67-70. y a f 
Univer. Minn. Press, 1957 Pp. 78-108. THORPE, W, H. + The learning abilities oi 
SPALDING, D, A. Instinct: With original ob- birds. Ibis, 1951, 93, 1-52; 252-296, 
servations o; 


n young animals, MacMillan's THORPE, W. H. earning and instinct in ani- 
Magazine, 1873, 27, 282-293, Reprinted in mals, Cambridge: Harvard Univer. Press, 
Brit. J. anim, Behav., 1954, 2, 2-11. 1956, A 

RUSH, ELINOR. Experi- VERPLANCK, W, S, Imprinting and learning. 
conceptions of anxiety „aper presented at the American Associa- 
M. R. Jones (Ed), Ne- tion for the Advancement of Science, 
on motivation. Univer. Washington, D C. December, 1958. 
Pp. 212-305, 


(Received April 2, 1959) 


and aversion, In 


braska Symposium 
Nebr, Press, 1956, 


A 


R 


Psychological Bulleti 
i 
1960, VoL. 57, No. a: 315-317 


CONTRADICTORY CONCLUSIONS FROM TWO SPEED OF 
PERFORMANCE MEASURES 


EUGENE S. EDGINGTON 
Kansas State Teachers College, Emporia 


Speed of performance can be ex- 
pressed in terms of amount performed 
per unit of time or amount of time 
per unit of performance. The follow- 
ing example will show how different 
the interpretation of computed sta- 
tistics can be, for these two alterna- 
tive measures of speed of perform- 


ance. 
The speeds shown in Table 1 are 


hypothetical running speeds for two 
rats, each rat being sampled on two 
different occasions. The same ob- 
servations are expressed in two ways 
in the table. Rat A on the average 
travelled more feet per second and 
was, therefore, faster. However, the 
same rat, Rat A, on the average took 
more seconds per foot travelled and 
was, therefore, slower. 


TABLE 1 
Speeps or RATS EXPRESSED IN Two Ways 


Seconds per Foot 


Feet per Second (1/Feet-per- 
Second) 
Rat A/Rat B Rat A/Rat B 
74 -50 25 
8 5 ‘125 .20 
Totals: 10 9 .625 45 
Means: 5 4.5 .3125 225 


tling the contradic- 
t the conven- 
ess velocity 


One way of set 
tion might be to accep 
tion of physicists to expr ) 
only in terms of distance per unit of 
time, but this solution would not 
satisfy anyone who considered dis- 
tance per unit of time and time per 
unit of distance to be equally valid 
expressions of speed. 


. Frequently in research the running 
times and latency times of rats are 
subjected to a reciprocal transforma- 
tion prior to statistical analysis; that 
is, the statistical analysis is carried 
out, using the reciprocals of the run- 
ning times and the reciprocals of the 
latency times. However, a significant 
difference between means of the re- 
ciprocals of running times cannot be 
interpreted as indicating a significant 
difference between means of running 
times. The same is true of latency 
times and their reciprocals. 

Furthermore, the nonlinearity of 
the relationship between the two re- 
ciprocally related measures (Y=1/X) 
indicates that there can be incon- 
sistency between them for any statis- 
tic that is derived from addition or 
subtraction of measurement values. 
Among the statistics subject to in- 
consistency are these conventional 
parametric statistics: the mean, the 
standard deviation, and the product- 
moment correlation coefficient. 

Any computed significance levels 
that are based on addition or sub- 
traction of the measurement num- 
bers can be different for the two 
measures; thus, one of the measures 
might lead to the rejection of a null 
hypothesis while the other measure 
might not. 

Up to this point the discussion has 
dealt with running times and latency 
times of rats, and the reciprocals of 
these measures. However, the statis- 
tical contradictions presented are not 
the result of carrying out the statis- 
tical analysis with the reciprocals in- 
stead of the original data. Either 0 
the two reciprocally related measures 
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is equally valid as a measure of speed 
of performance, and either one could 
be considered to be the reciprocal 
of the other. The point is that there 
are two precise measures of speed of 
performance which can lead to 


applies to speed of performance of 
any kind. For example, it applies to 
measuring manual dexterity with a 
Pegboard in terms of the number of 
Pegs per unit of time or in terms of 


Another 


time per 
nonsense syllable memorized. And, 


measured, 
ned can be 
would haye 
iprocally re- 


the significance level obtai 
inconsistent with what 

been obtained had the rec 
lated measure been used. 


PRoPosEp SoLution 


An obvious Way to avoid the con- 
tradictions th 


speed of per ormance would be to 
confine computations to order statis- 
tics, e g 


der of the original 
data, not the actual measurement 
values. The median 


; » Mode, and rank 
correlation coefficient are examples. 

he objection to this negative kind 
of solution is th recision of meas- 
urement is wasted, and consequently 
computed statistics Would be less sen- 


measures for any computed statistic. 
A logarithmic transformation of the 
measurement values (either perform- 
ance per unit of time or time per iit 
of performance) prior to A 
analysis accomplishes this result. 
Since the relationship between the 
two reciprocally related measures is 
Y=1/X, the relationship between 
their logarithms is log Y= -—log X; 
the values are identical, but have aa 
posite signs, Since the numerica 
values of the logarithms are identical, 
all statistics computed lead to con- 
sistent conclusions. When a statistic 
is computed for one measure, the 
conclusion based on it is exactly the 
Same as would be the conclusion 
based on the same statistic computed 
for the reciprocally related measure. 


two measures. 
mean of the loga- 
thm of the number 


The arithmetic 
rithms is the logari 


the antiloga- 
But why should one test 


of geometric means 
, he is interested in the 


hmetic means of the 
measures? The 


should not test fo 
metric means in 


, the inconsistency that is asso- 
i arithmetic mean of 
ance measurements 
the meaningfulness 
f hypotheses about 
of performance. In 


relative meaningful 
Statistics that may be computed 
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from the logarithms of the measure- 
ment numbers. 

Since most nonparametric statis- 
tics would lead to consistent results 
even without the logarithmic trans- 
formation, the principal advantage of 
the logarithmic transformation is in 
regard to parametric statistics. It is 
pertinent, therefore, to consider the 
normality of the distribution of log- 
arithms of speed of performance 
measurements. Now, the reciprocal 
relationship between the two speeds 
of performance measures requires that 
for one to be normally distributed the 
other must not be. This complicates 


the normality assumption for the 
speed of performance measures them- 
selves. On the other hand, if the log- 
arithms of one of the two measures 
are normally distributed, the loga- 
rithms of the other measure also must 
be normally distributed. Another 
point in favor of the logarithmic 
transformation is that speed of per- 
formance measurements often show a 
positive skewness, which is consistent 
with the assumption that their log- 
arithms are normally distributed, 
since a logarithmic transformation 
tends to reduce positive skewness. 
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SIGNIFICANCE TESTS FOR 
PROPORTION S, VARIAN! 


MULTIPLE COMPARISON OF 


CES, AND OTHER STATISTICS 


THOMAS A. RYAN 
Cornell University 


Most procedures for multiple com- 
Parison which have been 


ave been concerned only with com- 


Paper, however, c 
for testing multipl 


“method of adjus 
> and shal] ill 


able to 


test, Providin 
a of the dis. 


tions, we can also adapt the methods 
which have been developed especially 
for means, Since this provides a 
slightly more Powerful test for pro- 
Portions than the method of adjusted 
significance levels, it is to be preferred 
in this particular case, but it cannot 
be applied to the other cases. Con- 
sequently, we shall begin with the 
more general Procedure. 


METHOD of ADJUSTED SIGNIFICANCE 
LEVELs 


Like several of the methods for 

i Omparison of means, this 
method is also based upon testing 
i That is, the highest and 
values are compared 


st. If they are not found to differ 
Significantly, no further tests are 


made and We conclude that all 
Samples coy] een drawn from 
If the highest 
ues are found 
we then pro- 
airs in order 
Whenever the 


ceed to test other p; 
from the 
extremes of i 
are found to b 
conclude that th 


compared. He 


“adjusted 
Significance levels,” 


By suitable 


1 The nominal | 
the proba) 
are 
al 


? evel of significance refers to 
bility value given in the tables which 


ordinarily used for comparing two samples 
one, 
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choice of nominal significance levels, 
we can control the error rate experi- 
mentwise for the whole set of com- 
parisons. In other words, we are able 
to state the probability that one or 
more of our conclusions will be in- 
correct in that we falsely claim 
significant differences. (See Ryan 
[1959a] for arguments in support of 
controlling the error experimentwise). 
: Specifically, the method of ad- 
justing the significance levels is as 
follows: 

1. If we wish the error rate ex- 
perimentwise to be a, the first test 
of the extreme values is made at the 
nominal level 2a/n(m—1), where » is 
the number of samples being com- 
pared. That is, we use whatever 
tables would ordinarily be used in 
testing the difference between a 
single pair of samples, but we enter 
them for the two-tailed probability 
of 2a/n(n—1) instead of a. 

2. If the extremes turn out to be 
significantly different we test each ex- 
treme against the sample next to the 
other extreme and we do this at the 
nominal level of 2a/n(n—2). 

3. When we find a significant dif- 
ference in Step 2 we proceed to test 
smaller subgroups. The general rule is 
that, if we are testing a particular 
subgroup of k samples, we test the 
extremes of that subgroup at the 
nominal level of 2a/n(k—1). Notice 
that this adjusted significance level 


icance level apply only if the tests 
are made in layers as described 
above). 

The basis of the method. If the 
complete null hypothesis is true (if all 
of the samples come from a single 
population), and if we test all of the 
differences at the nominal level of 
2a/n(n—1), the error rate per experi- 
ment? is a. This is because we would 
be making n(n—1)/2 comparisons 
and the average rate of error per ex- 
periment is the product of the num- 
ber of comparisons times the rate of 
error per comparison (see Ryan, 
1959b, p. 39). 

The rate of error per experiment is 
always greater than the error rate 
experimentwise. Thus, if all differ- 
ences were tested at the nominal rate 
of 2a/n(m—1), the error rate ex- 
perimentwise would be less than a. 
In other words, the probability that 
one or more of the differences will be 
large enough to be considered signif- 
icant on this basis will be less than a. 
Consequently, the probability that 
the greatest difference will exceed 
the level required for this nominal 


rate is less than a. 


2 The rate of error per experiment is the ex- 

ted number of errors per experiment in the 
long run (Ryan, 1959b). Specifically, if a long 
series of experiments, each with several com- 
parisons, were actually carried out and we 
counted the number of erroneous conclusions 
the three main rates of error would be: 


no. of comparisons incorrectly called significant 


rate per comparison = 


total number of comparisons 


no. of comparisons incorrectly called significant 


rate per experiment= 


total number of experiments 


no. of experiments containing erroneous statements of significance 


rate experimentwise= 


takes into account both the size of 
the particular subgroup (k) and the 
size of the total set of samples (7). 
(These values of the adjusted signif- 


total number of experiments 


Thus the rate of error experimentwise can 
never be greater than the rate per experiment, 
because we cannot have more experiments 
with erroneous conclusions than there are 
errors in individual comparisons. 
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In the layer method we have de- 
scribed, only the extreme difference 
is tested at the level 2a/n(n—1). 
The argument of the preceding 
paragraph shows, however, that this 
is sufficient to ensure that the rate 
of error experimentwise is less than 
a when the complete null hypoth- 
esis is true. By changing to the 
experimentwise basis of measuring 
error rate, we are able to test the 
internal differences at a less stringent 
level without overstepping the limit 
of a for the rate of error. We shall, 
however, be increasing the rate of 
error per experiment under the com- 
plete null hypothesis unless we make 
all of the tests at the same nominal 
level. Working in layers with chang- 
ing levels of nominal significance as 
we have proposed increases the power 
of the test, without sacrificing con- 


trol of the experimentwise rate of 
error. 


The choice of the nominal level of 
significance for the tests of the in- 
ternal layers is based upon various 
partial null hypotheses, The justi- 
fication for the level which is to be 
used can be given for any partial 
null hypothesis as follows: 
complete null hypothesis is not true, 


yers, since it is no 
to conclude that the 
er Significantly, There 
e fewer Possible errors to be 
made, and we can therefore be less 
Stringent in testing the internal 
differences, 

Suppose th 


that the true state of 
affairs consist 


S of several different 


kı k k 
EWzs1i-(1-2 Eee 3 k, 
( n a)(1 n a) (1 a) (1k a) 


kiks 


populations with several samples 
from each. In general the situation 
may be represented as: 


kı samples from Population A 

kz samples from Population B 
with ya >up 

(u =population value of the sta- 
tistic being studied) 

ks samples from Population C 
etc. 

Zki=n (the total number of 
samples) 


Let us assume that the populations 
are far enough apart that we can 
ignore errors which place samples 
from different populations as signif- 
icantly different in the wrong order, 
and that it is almost certain that 
such samples will be correctly judged 
significant in the correct order. (This 
assumption is needed only for clarity 
of explanation; it will be shown below 
to be unnecessary jn general.) 

Following the 


€ , but fortunately their sum 
1S a good ap 


plete expression 


n n 


kıkz 


n? 


2 
n? a oe as 


> 
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If we keep a small, say .05, the 
terms involving a? and higher powers 
will affect EW very little. In fact, 
since the squared terms are all nega- 
tive, EW will be somewhat overes- 
timated if we take only the terms 
involving the ist power. Approxi- 
mately, then, the error rate experi- 
mentwise will be: 


kı ko ks ki 
EW <—a+— a+—a+:+-+—@ 
n n n n 


Zk 
=— =a 
n 


For example, suppose there are 10 
samples consisting of 2 samples from 
each of 5 populations, and take 
a=.05, then kı=2, k:=2, ete. 


2 a aes 
EW si-—|(1-— .05 1—— .05 
10 10 
2 
1—— .05 }, etc. 
10 


EW <1- (1—.01)' 
EW < .0490099501 


Thus the total error rate is divided 
up among the different groups of 
samples in proportion to the number 
of samples from the same popula- 
tion.’ It would have been better to 


3 For some particular null hypotheses in- 
volving a single mean from one of the popula- 
tions, the general formula for the adjusted 
significance level is slightly conservative. This 
is because a portion of the total error rate 
(1/n) is allotted to this single, sample even 
though there will be no errors involving this 
mean except for false reversals when the popu- 
lations are close together. It seemed best to 
keep the procedure uniform at the expense of 
this slight loss of power. For example, sup- 
pose that five samples come from one popula- 
tion and one from another. The total error 
rate will be 


instead of a. 
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divide the error rate in proportion to 
the number of comparisons in each 
group divided by the total number of 
erroneous comparisons which are 
possible. Since we do not know the 
latter in the absence of advance 
knowledge about the true state of 
affairs, we cannot divide the error 
rate in this way. It is, however, 
possible to divide the error rate in 
proportion to the number of samples 
involved in each population. 

Let us come back now to the prob- 
lem of erroneous reversal, i.e., where 
a sample from Population A is called 
significantly Jess than a sample from 
Population B, when in fact the con- 
clusion should have been just the re- 
verse. This kind of error is also taken 
care of by the layer method. Suppose 
we start with the complete null hypo- 
thesis and separate out the errors 
involving a particular sample value 
ma (and its corresponding population 
value pa). 

In the first stage of testing the total 
range of all the samples, there will be 
three classes of errors which might be 
made: 

1. Conclusions that pa is less than 
the value of p in some other popula- 
tion. 

2. Conclusions that a is greater 
than some other population value of 


3. Conclusions that other pairs of 
populations are different from each 
other. 

Now suppose that Population A 
has a higher value of u than all the 
rest, while all of the other samples 
come from a single population. The 
probability of errors of the first group 
will then be less than it was under 
the complete null hypothesis. The 
conclusions in the second group will 
no longer be errors, since they state 
the correct relationship between 4 
pair of populations. Errors in the 
last group will be somewhat less 
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likely than before because they will 
occur only if some sample from the 
lower population surpasses the sam- 
ple from A, 
probability of error is reduced, both 
because there are fewer possible er- 


Probability of these errors is reduced, 

Confidence limits. The layer meth- 
od applies only to tests of signif- 
icance. If we wish to state our results 
in terms of confidence limits or in- 


tween —.10 and .24,” and soon. We 


Population A and Po 
between 2 and 6,” he 
Variances of Population Aan 
lation Cis between (4 and 1, 


of a pair of Samples, It js Only neces. 


sary to adjust the nominal level or 
error rate per comparison to com- 
pensate for the number of confidence 
statements being made. If the nom- 
inal error rate is 2a/n(n—1) for each 
statement, then the error rate for the 
whole set of Statements will be a. 
This error rate is the rate per experi- 
ment, and is also an upper limit for 
the error rate experimentwise, 

Since the allowance for confidence 
limits is the same as the difference re- 
quired in testing the extremes of the 


because they are More 
to compute, and also 
are more widely used þ 


gists at the Present time, 


SPEcIFIC PROCEDURE FOR 
PROPORTIONS 


In order to show ho 
method ae How the general 


» beginning with 


e data: ņ samples consisting 
respectively of Ni, Noss. Nn cases. 


Ccurrence of some 


AX ang 


E 


e 
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TABLE 1 
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ĪLLUSTRATION OF MULTIPLE COMPARISONS OF PROPORTIONS AT 5% LEVEL EXPERIMENTWISE 


Data 


Samples (Training Methods) 


B D C A E Total 
No. of “High Scorers” 12 10 36 36 37 131 
No. in sample 60 40 80 60 50 290 
Proportion of “High Scorers” -20 -25 45 .60 74 45 
Method of Adjusted Significance Levels 
Nominal SEa 
Significance} 
Level 1 1 RD,= | Observed] Signifi- 
Croup aie a | * | P f= v. a (—+—) te" Say | Differencel eane 
= N: Ni 
n(k—1) 
B-E 5 .0050 | 2.81} .45 -0952 -268 54 s 
B-A 4 «0066 2.72 | .39 -0890 .242 -40 s 
D-E 4 -0066 2:12 | 352 - 1060 +288 -49 s 
B-C 3 .0100 | 2.58 | .32 .0797 -206 325; s 
D-A 3 -0100 2.58 | .46 -1017 -262 39 s 
C-E 3 -0100 a | Si. .0892 -230 +29 s 
BD 2 -0200 2283 | 722 -0846 .197 .05 ns 
DC 2 -0200 2.33 | .38 -0942 219 20 ns 
CA 2 -0200 2:33 | aL -0854 -199 15 ns 
AE 2 -0200 2.33 | 166 -0906 +211 14 ns 
Tukey Method 
e e SRt +. SR = RD, (for 
Group k SR; SR, =———— | WSD, = SR; SEx | (compari- 
2 son) 
B-E 5 2.73 2.73 -260 268 
B-A 4 2.57 2.65 .236 242 
D-E 4 2.57 2.65 281 288 
B-C 3 2.34 2.54 +202 -206 
D-A 3 2.34 2.54 -258 262 
C-E 3 2.34 2.54 .227 .230 
BD 2 1.96 2.35 -199 .197 
DC 2 1.96 2.35 +221 .219 
CA 2 1.96 2.35 -201 .199 
AE 2 1.96 2.35 .213 .211 


3. Determine the average propor- 
tion of occurrence p for all samples 
together by dividing the total num- 
ber of occurrences by the total num- 
ber of cases. 

4. Using this average value of 
proportion, p, find the standard error 


of the difference between the extreme 
groups (SEan) from the usual formula 


SEan=Vp(1—p)(1/Ni+1/Np) 


5. Find the standard score z, in 
the tables of the normal distribution 
corresponding to the two-tailed prob- 
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ability of 2a/n(n—1), where a is the 
experimentwise rate of error desired 
for the whole set of multiple com- 
parisons. , 

6. Find the required difference 
(RD,) by multiplying the standard 
score value and the standard error, 
i.e.: 


RD =S Ein" Zn 


7. If the difference between the 
highest and lowest Proportions is less 
than RD, we conclude that there is 
no significant difference among the 
samples and Stop testing. If the 
difference of the extremes is greater 
than this required difference, we 
conclude that the extreme samples 
differ Significantly, and Proceed with 
further tests as follows: 

8. Test next the two subgroups 
obtained by eliminating one of the 
extreme values, following the proce- 
dure of Sten 9 'fou Subgrouns of size 

Zhi. 

9. Procedure for t 


esting subgroups 
of k samples: 


difference, obtaining 


10. If either of the 
n—i means js signifi 
above Procedure we 


subgroups of 


=n—2 

which are not contained in an insignif- 

icant group of n~1, yan, 1959 b) 
The above Procedure 
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of means is 
based upon the distribution of the 
“studentized range,” (SR), i.e., the 
range divided by the standard error 
of the mean. The difference required 
for significance by Tukey’s method is 
called the WSD (wholly significant 
i Assuming that the dis- 


also necessary to allow for samples 
of unequal numbers.5 


When these adaptations are carried 


Previous Pages: 
1-4. Same as for RD, 
5. Find the 

the studentized 


degrees 


ind WSD by multiplying SR 


The Problem of Multiple 
tivately circulated mono- 
aph See also Ryan (1959a, 1959b), 
The Procedure for means as outlined by 
Tukey and im my previous paper (Ryan, 


1959a) assumed that all Samples were of equal 
size, 


€rror of the difference 
line for multiple com 
(see Step 6, where th 
duced because of the 
difference) 


Parison of Proportions 
e divisor, V2, is intro- 
Standard error of the 
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by the standard error of the differ- 
ence divided by the square root of 2: 
SEan*SRe 
WsD,=—>—_ 
v2 

7-8. Same, except that WSD, is 
used in place of RD. 

9. (c) We find SR; (by reading the 
tables of the studentized range for 
n=k) and average this value with the 
one previously found for 2, i.e., 


SREESR, = 

— SR; 
2 

(d) WSD. ML? 
= A k 

SEm SRi+SRa 

2 2 
10. Same. 


The example in Table 1 shows the 
computations for the WSD method 
in the lower section. It will be noted 
that there is relatively little differ- 
ence in the results, but that the 
Tukey method yields slightly smaller 
ranges necessary for significance (par- 
ticularly at the extremes) and is thus 
slightly more powerful than the 
method of adjusted significance levels 

So far in this discussion of handling 
sample proportions, we have assumed 
that the sampling distribution is 
approximately normal, for both of 
the methods. In principle, the meth- 
od of adjusted significance levels 
could also be used in cases where the 
samples are too small to justify the 
normal approximation. In such a 
case the “Fisher Exact Test” could 
be applied to the extremes at the ad- 
justed significance level, and so on 
through the various layers. In prac- 
tice, however, this is not likely to be 
very useful because such small sam- 
ples would not be likely to permit 
the levels of significance which are 
required for multiple comparisons. 
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MULTIPLE COMPARISON 
OF VARIANCES 

If we have computed the variances 
of a number of different samples, we 
may wish to draw conclusions about 
how the corresponding populations 
differ from one another in variability. 
The method of adjusted significance 
levels can also be used for this pur- 
pose. The outline of the procedure 
differs in detail, but not in principle 
from that which was given above for 
proportions. In describing the pro- 
cedure, we shall assume that the F 
test would be appropriate if only two 
samples were to be compared. 

Steps in computation: 

1. Data: a samples of Mı, 
Na- +- Nna cases respectively, vari- 
ance estimates from each sample are 
a S 

2. Arrange the variance in order of 
magnitude, calling the smallest s*, 
and the largest s„? (see Table 2 for 
illustration of the method). 

3. Test the extremes sı? and s,? by 
means of the F test at the nominal 
level of 2a/n(m—1), using the de- 
grees of freedom corresponding to 
the two sample sizes. If the test is 
not significant we proceed no further. 

N.B.: Tables of the percentage 
points of the F distribution are us- 
ually not extensive enough for this 
purpose, unless only a small number 
of samples is involved. In Pearson 
and Hartley (1954) the percentage 
points are 25, 10, 5, 2.5, 1, 0.5 and 
0.1 per cent. Yet, if we have 10 sam- 
ples to compare and we wish to work 
at the level of 5% experimentwise 
the extreme values should be tested 
at the .1/90 level or .0011 (i.e., 
0.11%). This value must be halved 
in using the percentage points, since 
the tables are designed for the one- 
tailed application of F which is ap- 
propriate to analysis of variance. The 
best we could do, then, for 10 vari- 
ances would be to work at the 10% 
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TABLE 2 ; 
MULTIPLE Comparison OF VARIANCES AT 5% LEVEL EXPERIMENTWISE 


Samples 
B A Cc E 
20 15 25 20 15 
We areodom 10.07 20.35 22.46 39.21 45.63 


Summary of Computations 
Nominal level 


Nominal level Re- Ob- ies 
k 3 Signifi- 
Group i an df A 2a — pra eons ance 
n(k—1) 
SS re 
B-E 5 15,20 00500 00250 -95 4.53 s 
B-C 4 20,20 00667 00333 3.55 3.89 s 
D-E 4 15,15 00667 00333 4.42 2.24 ns 
NE 3} not tested because D-E not significant 
B-A 3 | 25,20 01000 00500 3.22 | 2.93 ns 


Conclusions: There is evidence of only two Populatio; 
may belong to either, 


In addition, 
we would not find the intermediate 


values which are needed for other 


Other Percentage points of F can 
be obtained from the T, ables of the 


Function (K. Pear- 


Portions, at th 
level of 2a/n(n—2). 


of the tests in 
» Continue with 


€ iven q, ests are insignif- 
icant for a given Size of subgroup, In 


each subgroup, the test is made at 
the level of 2a/n(k~1). Again it 
should be noted that, once a Particu- 
lar subgroup is found nonsignificant, 


— ible for k =2 since all tests for k =3 are insignificant, 
Contino tests Dis evils n variances, B belongs to one, Cand E to the other, D and 


no tests are made within that sub- 
group (Ryan, 1959a), 

ote also 
each test depends upon the degrees of 


metric methods, 
Pose that we have 


ave been treated by the Mann- 
hitne i 
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TABLE 3 
MULTIPLE MANN-WHITNEY Tests AT 5% LEVEL EXPERIMENTWISE 


Samples Arranged in Order of Medians 


A G B E D 
7 9 10 11 25 
4 2 7 12 21 
10 10 16 7 14 
7 13 4 14 13 
Observed Values 0 9 23 9 14 
4 11 7 19 7 
8 13 12 16 20 
8 9 13 24 18 
Median 7 9.5 12.5 13 16 
Summary of Computations 
Pair k Required U (by p Signifi- 
a level inspection) (2-tailed) cance 
A-D 5 -0050 4 -0020 s 
C-D 4 -0067 8 -0100 ns 
A-E 4 -0067 5 -0020 s 
A-B 3 -0100 14.5 -0740 ns 


Note.—Group A is significantly different from Groups D and E, No other differences are significant. 


a Probabilities read from 


tables of Mann and Whitney (1947), also given in Siegel (1956). Note that those tables 


give one-tailed probabilities. For samples larger than 8, probabilities must often be computed from the normal 
approximation, since percentage points rather than the complete distributions are given. 


1955; Siegel, 1956) if we had only 
two samples, but we happen to have 
more than two to deal with. A non- 
parametric “analysis of variance” 
technique is available (Kruskal & 
Wallis, 1952; Siegel, 1956, pp. 184- 
194), but this provides only an over- 
all test and does not give us specific 
comparisons of samples. 

To perform multiple comparisons, 
we use either the complete distribu- 
tion of Mann and Whitney’s U 
statistic, or the normal approxima- 
tion to this distribution. Only a 
general outline of the procedure will 
be shown, since it differs little from 
the previous applications. Table 3 
illustrates the computations. 

Steps in Computation: 

1. Arrange the samples in order of 
size of the medians. 

2. Compute U (the Mann-Whit- 


ney statistic based on overlapping of 
the samples for the two extreme sam- 


ples.) 
3. Test the significance of U at 


the 2a/n(n—1) level, using the exact 
distribution of U, if available, or the 
normal approximation of this dis- 
tribution.® 

In the latter case the observed 
value of z is 4 


ne NN, 
r 2 
a 
12 


where Vi=number of cases in lowest 


4 Available tables (Mann & Whitney, 1955; 
Siegel, 1956) give the complete distributions 
up to samples of 8 cases. For larger samples, 
only the percentage points are given, but the 
normal approximation is fairly accurate. 


n 
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sample and V,=number of cases in 
highest sample. r 
4. Ifthe extremes are significantly 
different continue testing the inner 
layers. at the adjusted significance 
levels: 2a/n(k—1) where n is the 
_total number of samples and 2 is the 
‘size of subgroup being tested. 
~ 5. Continue in the manner de- 
scribed for proportions and variances, 


€ SUMMARY 


-A general method for multiple 
comparisons which is applicable to a 
-variety of measures is presented. 
This method, called the “method of 
adjusted significance levels,” makes 


- 


5 
FESTINGER, LG 


between means without reference to the fre- 
“quency distribution function. 
5 


Kruskat, W. H., & Watus, W. A, Use of 
tanks in one-criterion vy: 


THOMAS A. RYAN 


use of the “layer” method of testing 
and controls the rate of error ex- 
berimentwise. In order to illustrate 
the manner in which the method is 
adapted for use with a particular 
statistic, detailed Procedures are out- 
lined for the multiple comparison of 
Proportions, variances, and for one 
example of nonparametric compari- 
son of sample medians (Mann-Whit- 
ney test). Tukey’s method for multi- 
ple comparison of means which is 
based upon the distribution of the 
“studentized range” is also readily 
adaptable to the multiple comparison 
of Proportions, and instructions for 
its use are also included in this paper. 
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EXTRAVERSION-INTROVERSION AS A 
DIMENSION OF PERSONALITY: 
A REAPPRAISAL 


PATRICIA M. CARRIGAN! 
University of Michigan 


Few personality constructs have 
remained as controversial and as pro- 
ductive of research over the years as 
extraversion-introversion. First es- 
tablished by Jung (1923) as a ‘‘di- 
mension” of the normal personality, 
the construct has since been extended 
in many directions; it has been linked 
with physiological processes and mor- 
phology, with perceptual and cogni- 
tive behavior, with sociocultural phe- 
nomena, with physical and psycho- 
pathological disorders of one sort and 
another. Early attempts to demon- 
strate these relationships produced 
little in the way of definitive results; 
researchers began to doubt the va- 
lidity of the construct, and in the 
early forties, it looked for a time as 
though extraversion-introversion had 
had its day. Like the proverbial bad 
penny, however, the construct has 
continued to turn up, notably in 
factor analytic studies, and over the 
past decade it has gradually been 
reinstated as an important focus in 
personality research. In a review of 
factorial studies of personality, Ey- 
senck (1953) observed that although 


~ the picture is not as clear as one might wish 


..its main outlines are becoming more and 


1 The writer wishes to express her apprecia- 
tion to E. Lowell Kelly for his valuable advice 
and assistance in the preparation of this 
paper, and to Warren T. Norman and Richard 
D. Mann for their reading of the manuscript. 


more definite. ... At the type level, i.e., at a 
level where concepts are based essentially on 
the intercorrelations between traits, three 
main dimensions appear to have been estab- 
lished: Neuroticism, Extraversion-Introver- 
sion, and Psychoticism. These three dimen- 
sions appear to be relatively orthogonal to 
each other, and also to “g” (Thurstone’s sec- 
ond-order factor of cognitive functioning) (p. 
318). P 


Eysenck’s conclusions as they ap- 
ply to extraversion-introversion em- 
brace two issues of longstanding con- 
cern—briefly, the unidimensionality 
of the construct, and its relationship 
to “neuroticism” or, more broadly, 
adjustment. These issues were not 
adequately resolved at the time of 
Eysenck’s review; they have gained 
importance in the years since, as a re- 
sult of renewed interest in extraver- 
sion-introversion. In this paper, the 
two issues will be examined in the 
light of more recent evidence, in an 
attempt to clarify the current status 
of extraversion-introversion as a per- 
sonality dimension. 


THE ISSUES 

Is extraversion-introversion a uni- 
tary dimension? Doubt concerning 
the unidimensionality of extraver- 
sion-introversion was a natural conse- 
quence of the conflicting results of 
early research; it was reinforced by 
the repeated finding of low to mod- 
erate correlations (averaging about 
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735) between various measures of 


the so-called dimension (Bernreuter, 
1934; Guilford & Hunt, 1932; Hovey, 
1929; Moore & Steele, 1934; Stagner, 
1932; Vernon, 1938). While recogniz- 
ing that the measures were partly at 
fault, investigators began to suspect, 
in addition, that they were not deal- 
ing with a single dimension. 
Unidimensionality is clearly im- 
plied in Eysenck’s conclusions, above; 
in support of his position, he points 
to factors in rating and questionnaire 
studies (and a few objective and pro- 
jective test analyses) which, though 
bearing different names, seem to re- 
flect extraversion-like characteristics, 
However, many inconsistencies can 
be found in the factors, and empirical 


= evidence for their identity is virtually 


nonexistent. 

In the past few years, many psy- 
chologists have become increasingly 
convinced that extraversion-introver- 
sion is an important dimension of 
personality; yet there is curiously 
little erent as to its essential na- 
ture. Cattell (1957b), for example, has 
Presented evidence to indicate that 
extraversion-introversion is largely 
of environmental origin; Eysenck 
(1956a) is as firmly convinced by his 
research that heredi 


role. The Persistence of such dis- 
crepancies stron 


maintaining 
P two dimen- 
sions, others sharing Freud's (1920) 
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belief that introversion was a fore- 
runner of neurosis. Researchers gen- 
erally accepted Jung’s formulation, 
but ran into difficulty when it came 
to measuring the two dimensions. 
Guilford (1934) pointed up the prob- 
lem, calling attention to 

the very troublesome situation found by those 
who construct tests of IE [introversion-extro- 
version]andof “neurotic tendency,” a difficulty 


in keeping the two types of tests from corre- 
lating significantly with one another (p. 343). 


Again, the measures were suspect, 
but with repeated attempts to im- 
prove them, measures of extraversion 
continued to correlate as highly with 
adjustment measures as they did 
with each other (Bernreuter, 1934; 
Vernon, 1938). Thus, the possibility 
of an intrinsic relationship between 
the two dimensions could not be ruled 
out. 

The problem has been less appar- 
ent in factorial research, where, as 
Eysenck has noted, orthogonal fac- 
tors resembling extraversion-intro- 
version and neuroticism frequently 
appear in the same analyses. In 
many instances, however, the char- 
acteristics associated with “introver- 
sion” continue to have a strong mal- 
adjustive flavor. 

Clarification of these issues must 
be sought in multivariate research, 
examined in the light of well-defined 
criteria for unidimensionality and 
factorial independence. The follow- 
ing criteria appear useful; they guide 
the presentation of evidence, below, 
and provide a framework for subse- 
quent evaluation, 

1. If extraversion-introversion is a 
major, unitary dimension of person- 
ality, (a) it should be represented as a 
factor in all measures and media cov- 
ering the personality domain, and 


(b) the factors so obtained should be 

interrelated. 
T If extraversion-introversion and 

adj 


ustment are independent dimen- 


l a 
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sions, (a) factors corresponding to 
the two dimensions should be uncor- 
related, and (b) to the extent that the 
same variables appear on factors of 
extraversion-introversion and adjust- 
ment, indicators of “good? and 
“poor” adjustment should as fre- 
quently be associated with extraver- 
sion as with introversion. 


THE EVIDENCE 


In the present discussion, attention 
will be focused on research which has 
appeared since Eysenck’s 1953 re- 
view. However, exceptions will be 
made in the case of earlier studies 
which are particularly relevant to the 
issues outlined above. The evidence 
to be considered comes chiefly from 
the researches listed (with code sym- 
bol and reference citation) in Table 1. 


Analyses of Ratings 


Few factorial rating studies have 
been published in recent years; the 
one contribution of interest here is a 
second-order analysis by Cattell 
(CaD) of “‘life-record” data, based on 
observations of behavior in life situa- 
tions. Cattell’s second-order extra- 
version factor has positive loadings 
on F, Surgency (.70); E, Dominance 
(.54); A, Cyclothymia (.38); and H, 
Parmia (.17), the latter associated 
with gregarious sociability and im- 
pulsiveness. At the introvert pole, 
the factor is defined by M, Autia 
(—.54), linked with extreme subjec- 
tivity and “inner mental life.” 

The results of this analysis are es- 
pecially noteworthy, in view of Cat- 
tell’s longtime insistence that extra- 
version-introversion (E-I) was noth- 
ing more than a broad cluster of re- 
lated trait elements, and, as such, 
not a very useful construct (Cattell, 
1945, 1946, 1950). Having discovered 
second-order E-I factors in rating and 
questionnaire data, he now suggests 
that 


it is perhaps worth while to make a deter- 
mined attempt to rescue the label ‘‘extravert- 
vs-introvert”’ from the scientific disrepute and 
uselessness into which it has fallen through 
popular adoption (Cattell, 1957b, p. 267). 


Analyses of Questionnaires 


Much of the factorial research 
relevant to E-I is based on question- 
naires which evolved from a series of 
early factorial studies by Guilford and 
Guilford (1934, 1936, 1939a, 1939b). 
Among these questionnaires are Guil- 
ford’s Inventory of Factors STDCR 
(1940); the Guilford-Martin Inven- 
tory of Factors GAMIN (1943a); the 
Guilford-Martin Personnel Inven- 
tory (1943b); and the Guilford-Zim- 
merman Temperament Survey, or 
GZTS (1949), which incorporates 10 
factors from the preceding inven- 
tories 


Analyses of the Guilford 
Questionnaires 


It can be seen in Table 2 that an- 
alyses of the Guilford questionnaires 
have consistently yielded E-I factors 
defined by a similar pattern of vari- 
ables. Factors obtained by Denton 
and Taylor (De) and by North (No), 
in analyses of the STDCR inventory, 
have their principal loadings on R, 
Rhathymia (freedom from care) and 
S-, Social Extraversion2 R and S- 
also appear on Lovell’s factor (Lo) 
along with G, General Activity, and 
A, Ascendance, from the GAMIN in- 
ventory. The same four variables are 
distributed on three of the factors ob- 
tained by Thurstone (Thu) in a re- 
analysis of Lovell’s data. A second- 
order analysis of Thurstone’s matrix 
by Baehr (Ba) pulls together R, S-, 
G, and A on an extraversion-like 
factor, Primary Function, which is 
defined by Thurstone’s Factors VII, 
Impulsivity (.85) and V, Dom- 

2 Denton and Taylor's factor also has a 


loading of .29 on an objective test factor 
called Verbal Versatility. 
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TABLE 1 j 
MAJOR CHARACTERISTICS OF MULTIVARIATE STUDIES RELEVANT TO 
EXTRAVERSION-INTROVERSION (E-I) 
Num- 
Number and b | í 
p Analysis” ber of 
Investigator Symbol Sample VEE ot, | Type of Analysi Factors: 
aaa USSA ra (Gameias' Phu; below) 902 Second-order, oblique 6 
TE ät 8 
i 32 male, 30 female col- 25 T° | First-order, orthogonal 
Becker (1959) Be | gee sundents 159 
Cattell (1955) CaA | 250 USAF pilot trainees 64T First-order, oblique 15 dM 
CaB | 500 USAF pilot trainees is #3 First-order, oblique 16 
, ique 4 E 
iG 181 male & female col- 15Q Second-order, oblique 
Cattell (1956) sa lege students, | 
227 USAF trainees 
a CaD 544 male & female col- 12 R Second-order, oblique 6 
Cattell (19576) ia lege students | 
Cook & Wherry (1950) Co et awa submarine n 2 First-order, orthogonal 6 
2C Í 
1R 
Denton & Taylor De 170 high school seniors 8T Second-order, oblique 6 
(1955) 5Q j 
z 
E ck (1956; Ey 104 twins (13 pairs each 347 First-order (rotational cri- 6 
EE (LENN) male identical & frater- 11 PT | teria not Ai e iş 
nal, female identical & 2R 
fraternal) 2 SR 
1Q 
Foster (1955) FoA 54 state highway patrol 8 PT First- 4 
eae T ìrst-order, orthogonal d 
IE i 
FoB 28 college student vol- 8 irst- 
aar $ 2. First-order, orthogonal 5 
41 
3T 
Franks, Souief, & Fr 100 male, 100 female 7Q First. ale 
Maxwell‘! adult volunteers st-order, orthogonal g 
Guilford & Gu (Same as Lo, below) 69 Q First-order, orthogonal 18 
Zimmerman (1956) (Te, cosine-pi approxima- y 
tion) 
f 
Heron (1954) He 80 male unskilled fac- 19T First-order, orthogonal 4 
tory workers 4 L (Burt's simple summation) 
IR 
Hildebrand (1958) Hil | 95 male neurotics 15 és First-order® a 
B § 
Hi elwei mas A A n 
rane nogal, Him | 64 male surgical Patients 16T First-order orthogonal (r;; 2 
1Q unrotated) 
1G 
Ki ra > ; 7 
arson & Pool (1957b) | Kad ZA. maladjusted USAF 300 Correlational = 
Karson & Pool 1958) K: $ 7 i 
(1958) | Kap Tretaladiusted USAF | 169 Second-order, orthogonal 6 
Kassebaum, Couch, ale e 
& Slater (1980) Kas 160 male college fresh- 32 Q First-order, orthogonal 3 Æ 
Lovell (1945) L 
low Tale, 78 female col- 13Q Second-order, orthogonal 6 
^ Classified as follows: C, clinical observation; £ 


naire; R, behavioral rating 

» Unless otherwise noted, al 
quently rotated for simple stru 
retest measures, 5 
d Unpublished study, 195; 


Plus 7 


+ SR, self-rating; T, ol 


interest or a 
jective test. 


l factorizations by y 
cture. Seay 


Š random variables, 6 items 
° Not rotated for simple structure; rotational criteria 


described in text, 


ttitude inventory; 


PT, projective test; Q, question- 


ith centroid analyses based on Pearson r, factors subse- 
of background information, 


| 
| 
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TABLE 1 (Continued) 
Number and Num- 
Investigator Symbol Sample Type of Type of Analysis ber of 
Variables Factors 
R. D. Mann (1958) MaA | 100 male college students 26 2 Second-order,! orthogonal 7 
SI 
1R 
R. D. Mann? MaB | 100 male college students 26 Q Second-order, orthogonal 5 
MaC | 100 female college stu- 26 Q Second-order, orthogonal 5 
dents 
Nelson & Shea (1956) Ne 19 male, 33 female col- 15Q Correlational — 
lege students 
North (1949) No 155 male, 15 female col- 5Q Second-order, orthogonal 2 
lege students 
Royal (1950) Ro 100 male college students 2 Ha Correlational (rps) ee 
Scheier & Cattell Se 86 male college students 90 T First-order, oblique 15 
(1958) 17Q 
6R 
Singer, Wilensky, Si 100 male schizophrenics 9 PT First-order, oblique (rọ); 4 
& McCraven (1956) 3 a second-order, oblique 2 
1 SR 
Thornton & Tho 75 male, 25 female col- 5Q Correlational — 
Guilford (1936) lege students 5 PT 
Thurstone (1951) Thu (Same as Lo, above) 13Q First-order, oblique 9 
Tyler (1951) Ty 107 female graduate 15Q First-order, orthogonal & 5 
students oblique 
Welsh (1956) We 150 male VA medical & 169 First-order, orthogonal 3 
surgical patients (unrotated) 
Wheeler, Little, Wh 112 male college students 120 First-order, orthogonal 4 
& Lehner (1951) (matrix 1) 
Williams & Wi 100 male VA neuropsy- 17 PT First-order, orthogonal & 4 
Lawrence (1954) chiatric patients 140 oblique (r: for T & PT 
1T variables) 
Wood (1957) Wo 56 male & female college 18 Q Correlational = 
students 


f With respect to Q variables only, However, factors discussed here have no important loadings except on Q 


variables, hence are essentially second-order factors. 


& Unpublished analyses, Univer. of Michigan, 1959. Based on data obtained by Weitzenhoffer (1956). 


inance (.80). However, Thurstone’s 
first factor, Reflectiveness, with its 
principal loading on T, appears in- 
stead on Baehr’s Emotionally Un- 
stable factor. From this analysis— 
and from the preceding ones—it 
looks as if T, Thinking Introversion, 
is essentially a maladjustment fac- 
tor,? and that the core of E-I as meas- 
ured by the Guilford questionnaires 
consists of Factors R, S, G, and A. 
A question about the relationship 
of R to extraversion has been raised 


3 However, its GZTS counterpart, Thought- 
fulness, loads several extraversion-like factors 
obtained in joint analyses of the Guilford and 
Cattell questionnaires, discussed subsequently. 


by Guilford and Zimmerman, who 
have recently carried out another 
analysis (Gu) of Lovell’s data. In 
order to have several variables repre- 
senting each factor, they divided each 
of the factor scales into three or more 
short ‘‘tests,’’ by sorting the items in- 
to apparently homogeneous sub- 
groups. Sixty-nine “tests” or varia- 
bles were obtained in this manner; 
another—the subject’s sex—was 
added. The matrix of intercorrela- 
tions for the 70 variables yielded fairly 
good approximations of the 13 origi- 
nal questionnaire factors, along with 
a second C factor—C2—and four 
residuals. Minor changes in meaning 
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were indicated for several factors, and 
rather substantial ones for R. Half of 
the “tests” from the R scale went to 
other factors: reticence to A, impul- 
sivity to C2, and rapport with the en- 
vironment to O. In view of these 
modifications, particularly the last, 
Guilford and Zimmerman have ques- 
tioned the relationship of R to Jung’s 
extraversion, with which it has gen- 
erally been ‘dentified. However, the 
remaining attributes of R—carefree- 
ness, unconcern, and liking for ac- 
tion, along with the cheerfulness and 
energy formerly associated with Fac- 
tors D and G—seem ina broad sense, 
at least, to be consistent with extra- 
version. 


Analyses of the 16 PF Test 


Cattell’s E-I questionnaire factor 
emerged from a second-order anal- 
ysis (CaC) of the Sixteen Personal- 
ity Factor Questionnaire, or 16 PF 
test (Cattell, 1957a). This factor, 
shown in Table 2, is similar to the 
previously discussed rating factor, 
differing chiefly in the omission of E, 
Dominance, and the addition at the 
introvert pole of two primary factors 
unique to questionnaire data— Q1, 
Radicalism, and Q2, Self-Sufficiency. 

The 16 PF extraversion factor 
obtained by Karson and Pool (KaB) 
resembles Cattell’s in F, Surgency, 
and A, Cyclothymia, but the two 
factors are otherwise quite different. 
As seen in Table 2, Karson and Pool’s 
factor adds E, drops Q1, and has a 
negligible Q2 loading; more im- 


‘portant discrepancies are found in H, 


M, Autia. Cattell’s 


Parmia, and 


“factor has its highest loading on M, 


and a relatively small one on H; Kar- 
son and Pool’s E-I factor, on the other 
hand, has its highest loading on H 
and no loading on M, which appears 
on their anxiety factor C72). Sm- 
ilarly, M contributes little to the ex- 
traversion-like factors obtained in 


joint analyses of the Guilford and 
Cattell questionnaires, but in three 
of these analyses (MaA, MaB, MaC) 
it has substantial loadings—.44, 09, 
54—on maladjustment. Further- 
more, Wood’s 16 PF intercorrela- 
tions (Wo) show M to be virtually 
uncorrelated with the other extra- 
version primaries, but closely related 
to the major components of Cattell’s 
second-order anxiety factor (L, ©, 
Q3, Q4). Contrary to Cattell’s re- 
sults, then, these studies suggest that 
M is primarily a maladjustment fac- 
tor. 

The various studies do differ in sev- 
eral respects, and while the discrep- 
ant results are not adequately ac- 
counted for by these differences, it 1s 
well to mention them. In the first 
place, Cattell’s factors are oblique, 
Karson and Pool’s factors—and fac- 
tors from the joint analyses—are or- 
thogonal. The use of different rota- 
tional criteria might be expected to 
result in somewhat different factor 
patterns; it is not a sufficient ex- 
planation, however, for the correla- 
tion matrices themselves are quite 
dissimilar. Cattell’s matrix shows M, 
for example, to be a relatively inde- 
pendent factor, having its highest 
correlation (—.36) with F. On the 
other hand, Karson and Pool, Mann, 
and Weitzenhoffer (1956)—whose 
matrices were used in Mann’s B and 
C analyses—found M to be sub- 
stantially correlated with several fac- 
tors, notably the adjustment pri- 
maries. It should be noted, too, that 
Cattell’s matrix consists of correla- 
tions between the primary factors, the 
others of correlations between factor 
scores. However, for the sample on 
which Cattell’s analysis is based, the 
correlations between factor scores do 
not differ greatly from the primary 
factor intercorrelations (Cattell, 
1957a). Finally, the various analyses 
are based on somewhat different pop- 
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ulations—i.e., Karson and Pool’s on 
Air Force personnel, Mann’s on col- 
lege students, Cattell’s ona combined 
group of college students and Air 
Force trainees. On the basis of popu- 
lation differences, then, it would be 
expected that the greatest discrep- 
ancies would be found between Kar- 
son and Pool’s analyses and Mann's. 
Quite the contrary, these studies 
yielded the most comparable inter- 
correlations, and the Karson and Pool 
E-I factor is closely paralleled by 
one factor from each of Mann’s anal- 
yses (MaA and MaB III, MacC II). 
Moreover, these studies unanimously 
fail to support M as a major E-I vari- 
able. Thus, while some of Cattell’s 
primary factors—notably F and H— 
seem well-established as nuclear parts 
of the extraversion pattern, the role 
of M remains unclear. 


Joint Analyses: Guilford and Cattell 
Questionnaires 


Inasmuch as the questionnaires of 
Guilford and Cattell cover a wide 
range of personality characteristics, it 
might be expected that the measures 
would overlap to some extent, and 
that the two sets of extraversion 
factors would be closely related. The 
nature of the relationship can be 
seen in Table 3, in the GZTS-16 
PF intercorrelations obtained by 
Weitzenhoffer. It is interesting to 
note that the questionnaire scales 
with consistently high loadings on 
EI factors—Guilford’s R, S, G, and 
A, and Cattell’s F and H—form a 
highly correlated “cluster,” and that 
except for E, Dominance, the re- 
maining extraversion primaries from 
the two inventories are only tan- 
gentially linked with the cluster. 

Of greater interest is what happens 
to the cluster when the intercorrela- 
tions for the Guilford and Cattell 
scales are jointly factored. Relevant 
factors from Mann's analyses (MaA, 


MaB, MaC), and Becker’s (Be), ap- 
pear in Table 2. One of the first 
things to be noted is that only one of 
the joint analyses yielded a factor 
which clearly corresponds to the 
cluster described above: Factor I] in 
Analysis MaC, which has its principal 
loadings on 16 PF E, F, and H, and 
GZTS G, R-, A, and S. The MaA 
and MaB analyses split the cluster 
and distributed its variables on two 
factors—F actor III, Social Extrover- 
sion, which combines GZTS S, G, and 
A with 16 PF H and E, and Factor 
IV, Lack of Self-Control, which links 
GZTS R- (and T-, one of the “fringe” 
variables) with 16 PF F. Becker’s 
factor seems most closely related to 
the latter, by virtue of its loadings on 
Guilford’s R- and T- and 16 PF G-, 
Lack of Internal Standards. Unfor- 
tunately, comparison is hindered by 
the fact that 16 PF A, E, F, and H 
are represented by a single score in 
Becker’s analysis (see footnote, Table 
2). Finally, looking again at the MaC 
analysis, it will be noted that Factor 
II, despite its sizeable loadings on all 
of the cluster variables, is most heav- 
ily weighted by GZTS S, G, and A, 
and 16 PF H; in short, it is most 
similar to the Social Extroversion 
factors from the MaA and MaB anal- 
yses. MaC Factor III-, with its 
GZTS R-and T- loadings, and Factor 
IV, defined principally by 16 PF 
G- and Q3-, Lack of Will Control, 
may be a further split of the Lack of 
Self-Control factors obtained in the 
MaA and MaB analyses. 

From Mann’s analyses, then, it ap- 
pears that two or more factors are re- 
quired to account for the intercor- 
relations between E-I variables from 
the Guilford and Cattell question- 
naires. Moreover, the factors show 
remarkably little overlap; only F, 
Surgency, has loadings as great as .30 
on the two factors from the MaA and 
MaB analyses. It would seem, there- 
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TABLE 3 
INTERCORRELATIONS BETWEEN E-I VARIABLES: 


GZTS AND 16 P.F. 


QUESTIONNAIRES 


(From Weitzenhoffer, 1956) 


GZTS 
Variable ees eae a 
G R A S P 
16 PF, 
A 16 —15 19 28** —05 
07 ei Mies 20* 29%% —05 
E 29** <I5** 44** 37** Fed 
39** —07 47** = —04 
F 30** —60** 26** 44** Er 
40** —54** 28** . a1** —29** 
H 47** —36** 74** 74** 06 
45** —40** 69** a Hag —02 
M 10 06 04 —18 K Y i 
os —30** 17 05 15 
Q1 13 29** 19 —14 37** 
06 06 11 06 40** 
Q2 —19 23* —11 —45** 2a" 
—08 26"* —28** —44** 17 


Note.—Intercorrelations based on 100 males, 100 females, respectively. Italicized coefficients indicate reversals 


of expected sign. 
* Significant at .05. 
** Significant at .O1. 
fore, that these factors represent rela- 
tively distinct dimensions. 

As to the nature of the dimensions, 
Mann (1958), in a discussion of his 
MaA factors, has suggested the possi- 
bility that 
Factor III corresponds to the American con- 
ception of extroversion, with its emphasis on 
sociability and ease in interpersonal rela- 
tions, while Factor IV corresponds to the 
European conception of extroversion, with 


its emphasis on impulsiveness and weak super- 
ego controls (p. 108). 


Mann’s distinction appears to be a 
valuable one; perhaps, however, it 
can be more precisely tied down in 
terms of the major variables defining 
the two factors. 

Looking first at Social Extrover- 
sion, the vitality and enthusiasm 
associated with GZTS G, the aspira- 
tion to leadership and interpersonal 
interaction reflected in A and 16 
PF E, the seeking of (and pleasure 
in) social contacts described by 
GZTS S and 16 PF H, all appear 


to be ingredients of response to the 
environment and its “objects,” 1.€., 
people. A dimension described by 
these variables might then be broadly 
conceptualized as one of response to 
external stimuli, with the extremes 
characterized as approach vs. avoid- 
ance. Thus defined, Social Extrover- 
sion would seem to approximate 
Jung’s (1923) conception of extraver- 
sion, the essence of which is the rela- 
tive importance accorded the “‘ob- 
ject” and objective events. The 
negative pole of the factor might like- 
wise be identified with Jung’s intro- 
version—emphasis on the self and 
inner, subjective processes—to the 
extent that avoidance of the external 
world can be viewed as a consequence 
of such self-preoccupation. 

Mann’s Lack of Self-Control fac- 
tor, on the other hand, suggests a 
very different conception of E-I. 
Among the variables defining this 
factor, GZTS R- contrasts happy-g9- 
lucky unconcern with seriousness and 
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self-control; T- is associated with 
mental disconcertedness, as opposed 
to reflectiveness and self-observation; 
16 PF F reflects carefreeness vs. 
introspectiveness and brooding; G- 
is associated with lack of dependabil- 
ity and indolence, as opposed to per- 
severance and conscientiousness; Q3- 
contrasts laxity with control. The 
essence of these characteristics seems 
to be their relevance to the handling 
of impulses; the dimension they de- 
scribe might be thought of as one of 
response to stimuli arising from 
within. Viewed in this way, Mann’s 
Lack of Self-Control factor is readily 
identified with Eysenck’s conception 
of E-I. In his Dynamics of Anxiety 
and Hysteria, Eysenck (1957) char- 
acterizes the neurotic extravert as 
undersocialized (schematically, id 
-+ego>super-ego); the neurotic in- 
trovert, on the other hand, is de- 
scribed as oversocialized (super-ego 
+ego>id). An empirical link with 
Eysenck’s viewpoint is provided by 
the high R loading on Lack of Self- 
Control. Eysenck considers R to bea 
good measure of his dimension; he 
has used it both as a research crite- 
rion and as the basis for the Extraver- 
sion (E) scale in the Maudsley Per- 
sonality Inventory (Eysenck, 1956b). 
Of further interest is the fact that 
until recently, at least, Eysenck has 
been unwilling to include sociability 
as part of his extraversion constella- 
tion. In view of the independence of 
Social Extroversion and Lack of Self- 
Control, it appears that he may have 
been quite correct, 
Several implicatio 


from the joint analyses reported here. 
One concerns the 


relationship be- 
tween E-I and adjustment. On the 


ns can be drawn 


4 F also contrasts enthusias 
and talkativeness with inco 
—a contrast more relevant tı 
environment. That F | 
thus not surprising. 


m, cheerfulness, 
mmunicativeness 
O response to the 
oads both factors is 


basis of the MaA analysis, a good case 
can be made for identifying Social 
Extroversion as a factor of ‘‘well- 
adjusted” extraversion. It can be 
seen in Table 2 that the factor (III) 
tends to have positive loadings on 
variables associated with “good! 
adjustment, negative ones on vari- 
ables related to maladjustment. On 
the other hand, the MaB and MaC 
counterparts (Factors III and I, 
respectively) do not reflect this 
tendency. It should be pointed out, 
however, that the highest-loading 
variables on Social Extroversion— 
GZTS S and 16 PF H—have small 
but consistently positive loadings on 
adjustment factors, in every analysis 
which included them. It cannot be 
denied, moreover, that in a culture 
such as ours, which places a high pre- 
mium on interpersonal interaction, 
the characteristic avoidance of such 
interaction—associated here with 
social introversion—might be con- 
sidered maladaptive. 

On the other hand, if Mann’s Lack 
of Self-Control factor is correctly 
identified with Eysenck’s dimension, 
it would appear that both extremes 
of this factor are linked with malad- 
justment. Presumably the individual 
whose ego mediates a more harmoni- 
ous relationship between the expres- 
sion and control of impulses—i.e., 
the individual falling near the middle 
of the dimension—would be better 
adjusted than individuals at either 
extreme. However, to the extent that 
society rewards self-control and con- 
formity to cultural standards, the 
factor might be looked upon as con- 
trasting maladjusted extraversion 
with well-adjusted introversion. The 
latter interpretation is favored by the 
MaA analysis, where Factor IV tends 
to have positive loadings on ‘‘mal- 
adjustment” variables, negative ones 
on variables reflecting “good” adjust- 
ment (see Table 2). Again, however, 
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the MaB and MaC analyses do not 
concur. 

Further implications stem from 
the independence of the two factors. 
While the relationship to adjust- 
ment requires further clarification, if 
it should turn out that Social Extro- 
version and Lack of Self-Control do 
reflect well-adjusted and maladjusted 
extraversion, respectively, the lack 
of overlap in the two factors might 
suggest that extraversion and intro- 
version are differentially manifested 
in individuals falling at opposite ends 
of the adjustment continuum. Dis- 
crepancies between the MaA and 
MaB factors, based on male subjects, 
and the MaC factors, based on female 
subjects, suggest further a qualitative 
sex difference in E-I. A final implica- 
tion concerns the unidimensionality 
of E-I. In view of the independence 
of Mann’s factors, it is quite clear 
that the dimensions they represent 
cannot be subsumed under the same 
label. 


Analyses of the MMPI 


With a few exceptions (Abrams, 
1949; Cottle, 1950; Wheeler, Little, & 
Lehner, 1951 [Matrix 2]), factorial 
studies of the Minnesota Multiphasic 
Personality Inventory (MMPI) have 
consistently yielded bipolar factors 
with contrasting loadings on Ma, 
Hypomania, and D, Depression. 
That these factors (Table 4) may be 
related to E-I is suggested by several 
analyses, in which the MMPI clinical 
scales have been supplemented by 
various “personality” scales de- 
veloped for the inventory. 

Two factors from Tyler's analysis 
(Ty) are relevant. As Table 4 shows, 
Factor II, a “hysteroid” conflict 
factor, adds to the Ma-D contrast a 
dimension of responsibility (Re) not 
uncommonly associated with intro- 
version. The appearance of Hy, Hys- 
teria, at the “introvert” extreme re- 


quires comment, however. Accord- 
ing to Eysenck’s theory,® hysteria is 
associated with extraversion, and the 
negative Hy loading—here, and on 
several other factors in Table 4— 
would thus seem to be inconsistent. 
It will be seen later, however, that 
the MMPI Hyscale is essentially un- 
related to Eysenck’s E-I dimension. 
Of the remaining variables defining 
Tyler’s second factor, the prominence 
of Pt, Psychasthenia, and Sc, Schizo- 
phrenia, might suggest that the fac- 
tor is one of ‘‘maladjusted”’ introver- 
sion, but the overall resemblance of 
the factor to E-I is not impressive. 

Tyler's third factor, Social Ag- 
gressiveness, differs somewhat in the 
orthogonal and oblique rotations. 
The oblique factor links Ma with Do, 
Dominance, and St, Social Status, 
and has its highest loading on an- 
other scale suggestive of Eysenck’s 
extraversion—Pd, Psychopathic De- 
viate. D does not appear on the 
factor, but the variables defining the 
negative pole do not seem incon- 
sistent with introversion. The orthog- 
onal factor, on the other hand, is less 
well defined, and the substantial Pé 
and Sc loadings suggest that it would 
have to be looked upon as a factor of 
“maladjusted” extraversion. In gen- 
eral, Tyler’s analysis seems to con- 
firm the presence of an extraversion- 
like dimension in the MMPI, but the 
exact nature of the dimension is by 
no means clear. 

Welsh’s analysis (We) is not 
strictly comparable to other MMPI 
analyses. It is based chiefly on prime 
scales—modified versions of the orig- 
inal scales containing no multiple 
scored items and, hence, not subject 
to the spurious intercorrelation in- 
troduced by item overlap. Several 
other special scales are included: 

5 The reader unfamiliar with Eysenck’s the- 


ory will find a brief discussion later in this 
paper. 
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Gm, consisting of items scored for at 
least three MMPI scales and thought, 
therefore, to reflect some general 
MMPI dimension, presumably mal- 
adjustment; Ja, a rational scale of 
anxiety; and three empirical scales— 
A, Anxiety, M, Mania, and R, Re- 
pression—made up of items found to 
differentiate between subjects scoring 
at the high and low extremes on the 
Gm, Ma’, and D’ scales, respec- 
tively. 

Welsh’s second factor contrasts 
Ma’ and M with D’ and R, and has a 
small loading on Si’, Social Introver- 
sion; it also brings in at the ‘‘intro- 
vert” pole the rationally-derived mal- 
adjustment scales (Gm, Ja), although 
Pt’ and Sc’ do not appear on the fac- 
tor. Inasmuch as Welsh’s factors are 
unrotated, the analysis as it stands is 
not very satisfactory. It does, how- 
ever, provide added evidence for an 
E-I dimension in the MMPI—a 
dimension which apparently trans- 
cends item overlap in the scales. 

By far the most impressive results 
are those from Kassebaum, Couch, 
and Slater’s analysis (Kas). Their 
second factor, Introversion-Extra- 
version, links D with R, Si, Re, and 
L, Lie Score; Ma appears at the ex- 
travert pole, along with Im, Im- 
pulsivity, and several scales sugges- 
tive of a “social” orientation. The 
factor thus incorporates several scales 
associated with the other factors in 
Table 4, and adds a number of “‘per- 
sonality” scales which further iden- 
tify it as an E-I factor. 

In their discussion of the factor, 
Kassebaum et al. have called atten- 
tion to the fact that two of the scales 
defining introversion—R and L— 
consist solely of items scored for a 
“False” response; a third “introver- 
sion” scale, D, likewise has a pre- 
ponderance of “False” items, whereas 
Ma, associated with extraversion, 
contains significantly more “True” 


items. On the basis of these facts, the 
writers suggest the possibility that 

what we have labeled extraversion is associ- 
ated with a general tendency to agree with 
any item whatever the content, while what 
we have called introversion involves a con- 


verse tendency to disagree or mark False 
(Kassebaum, Couch & Slater, 1959, p. 230). 


The extent to which such a ‘‘response 
set” may be involved in the various 
E-I questionnaire factors will be con- 
sidered later. 

It can be seen in Table 4 that 
most of the MMPI scales contribute 
substantially to maladjustment fac- 
tors. Hence, in Kassebaum, Couch, 
and Slater’s analysis, all of the E-I 
variables except R and L have load- 
ings of .48 or above on Factor I, Ego 
Weakness. The nature of the rela- 
tionship between these two “dimen- 
sions” is clarified to some extent by a 
further step in the analysis. Kasse- 
baum and his colleagues reasoned 
that if their first two factors were cor- 
rectly interpreted as Ego Weakness 
(maladjustment) and Introversion- 
Extraversion, it should be possible to 
identify more precisely the character- 
istics of “normal” and “disturbed” 
extraversion and introversion by ro- 
tating the axes 45 degrees and redefin- 
ing the factors in their new positions. 
The axes were shifted accordingly, 
yielding two fusion factors, so named 
because they were thought of as 
combinations of the primary refer- 
ence axes. Fusion Factor A, contrast- 
ing maladjusted introversion with 
normal extraversion, was labeled So- 
cial Withdrawal vs. Social Participa- 
tion. It had its principal loadings on 
Si, D, and Fm, Feminine Masochism, 
and, at the negative pole, on the ‘'so- 
cial” scales Sp, Sy, and St. Fusion 
Factor B, Impulsivity vs. Intellectual 
Control, contrasted maladjusted ex- 
traversion with well-adjusted intro- 
version. Its largest loadings were On 
Im, Ma, and, negatively, on Re, To: 
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Tolerance, and Ac, Achievement via 
Conformance. As would be expected, 
the two factors shared substantial 
loadings on a number of scales related 
primarily to maladjustment. 

The conceptual similarity of the 
fusion factors to Mann’s Social Ex- 
troversion and Lack of Self-Control 
is apparent. And, just as Mann’s 
analyses split the cluster of extraver- 
sion variables from the GZTS and 16 
PF questionnaires, the two fusion 
factors obtained by Kassebaum et al. 
show a clear separation of the MMPI 
scales contributing to their E-I 
factor, with the exception of R, Re- 
pression, which has sizeable loadings 
on both A and B. Again, there seems 
to be a clear implication that differ- 
ences in adjustment may be associ- 
ated with qualitatively different mani- 
festations of extraversion and intro- 
version. 


Joint Analyses: MMPI and the 
Factorial Questionnaires 


Relationships between the MMPI 
and the Guilford and Cattell ques- 
tionnaires have been explored in sey- 
eral studies. A recent analysis by 
Franks, Souief, and Maxwell (Fr) 
is based on seven scales from the 
MMPI and STDCR inventories, 
selected as likely measures of E-I or 
neuroticism: Guilford’s D, C, and R 


scales, and MMPI K, Hy, Pd, and 


e€ second and third factors ob- 


Extraversion-Introversion. 
III, Rhathymia has its princi 
ê , rinci 

loading on R. 5 pal 

This study raises seye 


tal questions 
which, 


unfortunately, go unan- 


swered.s Nowhere do the writers offer 
an explanation for the seemingly ar- 
bitrary rotation of Factors II and III. 
The deliberate elimination of R from 
the E-I factor is puzzling; Franks 
himself has used R as a measure of 
extraversion (Franks, 1956; Franks 
& Laverty, 1955; Laverty & Franks, 
1956), and, moreover, the R scale 
was included in the present study be- 
cause of its previously demonstrated 
relationship to the E-I dimension. 
The rationale for the interpretation 
of Factor II is equally unclear. The 
authors cite studies by Eriksen 
(1954a, 1954b) and by Eriksen and 
Davids (1955), which showed that 
college students obtaining high Hy 
and Pt scores, respectively, had cer- 
tain characteristics in common with 
Eysenck’s extraverted and intro- 
verted neurotics, However, these 
findings do not seem especially rele- 
Il loading on Fac- 


ferentiate between hysterics, psycho- 
Paths, and anxiety states, the latter 
an “introvert” group; moreover, Hy 
correlated negatively (—.115) with 
the hysteria-anxiety dichotomy. 
Franks et al. are undoubtedly wise 
to make a conservative interpreta- 
tion of Factor ITI, in terms of its 
Major variable, R, Nevertheless, the 
identification of this factor (rather 
than Factor II) with E-I would seem 


tion. Perhaps the issues raj 
dealt with in the revision, 


| 


> 


©) 
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to be more defensible on the basis of 
existing evidence. 

A broader picture of the relation- 
ships between E-I variables from the 
MMPI and the factorial question- 
naires comes from two correlational 
studies—one by Nelson and Shea 
(Ne), using MMPI and the STDCR 
inventories, the other Karson and 
Pool’s study (KaA) of the MMPI and 
16 PF. Relevant coefficients from the 
two studies are reproduced in Table 
5. It can be seen that only the Si 
scale is consistently related to the 
extraversion primaries from the two 
factorial questionnaires. Ma and D 
tend to correlate with the principal 
scales from the previously described 
extraversion ‘cluster’ (Guilford’s R 
and S, Cattell’s F and H) but with 
no others; and the remaining MMPI 
scales—K, Hy, Pd, and L—have 
little in common with the factorial 
measures. 

Karson and Pool’s data shed 
further light on the nature of 16 
PF Factor M, Autia. M correlates 
not only with K- and Si, but also with 


MMPI Pt (.48), Sc (.48), Mf (.47) 
and F (.46)—all scales which are 
linked with maladjustment (see Ta- 
ble 4). Thus, earlier indications that 
M may be essentially a maladjust- 
ment factor seem to be borne out 
here. 

It is regrettable that the authors of 
these studies did not carry out factor 
analyses of their data. In Karson 
and Pool’s publications, complex rela- 
tionships between the scales are not 
readily disentangled by inspection 
of the matrix, although some clarifi- 
cation is provided in a separate arti- 
cle (Karson, 1958). Nevertheless, the 
two studies are of interest in provid- 
ing an empirical link between M MPI 
Ma, D, and Si and the principal ex- 
traversion primaries from the fac- 
torial questionnaires. 


Questionnaire: Factors and Acquies- 
cence 


That E-I questionnaire factors 
may reflect certain response tenden- 
cies—as has been suggested by Kasse- 
baum, Couch, and Slater in connec- 


TABLE 5 


INTERCORRELATIONS BETWEEN E-I VARIABLES? 
MMPI, STDCR, anv 16 PF QUESTIONNAIRES 
(From analyses Ne and KaA) 


MMPI* 
Variable 
K D Hy Pd Ma Si 
STDCR 
S 28* —24 —15 —46** 69** 
R —30** —07 -06 50** —50** 
16 PF 
A 13 —1i 20 —09 00 —33** 
E 23 -11 05 02 10 —27* 
F 08 —26* 05 01 24* —48** 
H 2 —24* 05 00 19 —69** 
M —48** 17 16 29* 17 32** 
Qi —10 22 Siti 19 11 11 
Q2 —22 01 —03 —02 03 32** 


Note.—Italicize coefficients indicate reversals of expcted sign. 
® 7, not included in analysis Ne, did not correlate significantly with any 16 PF extraversion primaries. 


* Significant at .0S. 
** Significant at .01. 
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tion with their MMPI factor—is 
a possibility which merits careful 
consideration. Evidence has been 
presented to show that an E-I dimen- 
sion can be demonstrated in the vari- 
ous questionnaires. However, that 
evidence rests on the assumption that 
the questionnaire factors can be 
validly interpreted in terms of the 
“psychological meaning” of the vari- 
ables which define them. If the co- 
variation among the factor variables 
can be accounted for on some basis 
other than common meaning, the 
label extraversion-introversion would 
seem to be prematurely applied, and 
perhaps inappropriate. 

In order to examine the suscepti- 
bility of the various “‘extraversion’’ 
scales to agreement response set, or 
acquiescence, the principal ques- 
tionnaire variables have been listed 
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in Table 6, along with the percent- 
ages of “agreement” and “disagree- 
ment” items they contain. Looking 
first at the Guilford scales, it can be 
seen that, in general, extraversion 
does tend to be associated with a 
higher percentage of agreement 
items, the only exceptions being 
GZTS S, which contains equal num- 
bers of “True” and “False” items, 
and T, where the trend reverses. 

An attempt was made to rule out 
various response tendencies in the 16 
PF test, by balancing the number 
of “Yes or a” and “No or c” items 
scored for each scale (Cattell, 1956c). 
However, many of the items are not 
of the simple endorsement type, re- 
quiring instead a choice between al- 
ternative statements, e.g,, “I would 
prefer the life of (a) an artist, (b) un- 
certain, (c) a Secretary running a so- 


TABLE 6 


PERCENT “TRUE” AND “FALSE” 


ITEMS IN THE P. 
SCALES LOADING E-I F, 


(Includes all items indicating agreeme; 
ete.) and disagreement [False, 


RINCIP, 


AL QUESTIONNAIRE 
ACTORS 


nt (True, Yes, Always, 


No, Never, etc.]) 
No. of % % ar o 
Scale A 2 a No. of % 
items True False items True Reve 
GZTS 
23 31 
30 40 
30 30 
100 0 
67 33 
76 24 
85 15 
25 75 
100 0 
75 25 
50 50 
56 44 
44 56 
69 31 


d, as indicated by minus 
f; on. 
Yes, 32 (38%) for No responses. 


d M (data not available), 
ugh, 


D 


Ze) 
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cial club.” Such items can scarcely 
reflect acquiescence, but, by the same 
token, they cannot be counted as part 
of the balanced distribution of items 
intended to rule it out. Thus, in de- 
termining the susceptibility of the 16 
PF scales to acquiescence, these 
“neutral” items must be disregarded 
and consideration given only to the 
items which reflect agreement or dis- 
agreement. It can be seen in Table 
6 that H and Q2 are equally weighted 
with “True” and “False” items; A, 
E, M, and Q1 differ by one item only. 
F, however, contains enough more 
“True” items so that scores on the F 
scale might be affected to some extent 
by acquiescence. 

Turning to the MMPI scales, it 
can be seen that a “response set” 
interpretation of the factor obtained 
by Kassebaum, Couch, and Slater is 
supported not only by R, L, Ma, and 
D, but by the percentages of “True” 
and “False” items in the Re, Sy, Sp, 
and Im scales as well. Contrary to 
such an interpretation, however, are 
Do and St, which, though related to 
extraversion, contain more ‘‘False”’ 
items than “True” ones, and Si, 
whose items are evenly divided be- 
tween the two categories. 

It is quite possible that ‘‘psycho- 
logical meaning” and acquiescence 
are confounded in a number of the 
questionnaire scales defining E-I 
factors, and until some means is 
found for distinguishing the two com- 
ponents, factor interpretations must 
take both aspects into account. At 
the same time, it is apparent that the 
E-I factors cannot be “‘explained”’ in 
terms of acquiescence alone. For the 
present, then, interpretations based 
on “psychological meaning” may be 
considered as having some validity. 


Analyses of Objective Tests 


Several objective test analyses re- 
lated to E-I have appeared in recent 


years—some carried out by Cattell 
and his associates, others from Ey- 
senck's laboratory. The latter studies 
rely heavily on tests of supposed or 
demonstrated relevance to particular 
dimensions of personality, whereas 
Cattell’s analyses are based on tests 
intended to cover the entire ‘‘per- 
sonality sphere.” As might be ex- 
pected, the test batteries used in the 
two sets of studies differ consider- 
ably, and the resulting E-I factors 
are not readily compared. 


Analyses from Cattell’s Laboratory 


Cattell’s objective test Factor UI 
32, formerly Schizothyme With- 
drawal, is now described as an extra- 
version factor (Cattell, 1957b); it 
has been renamed Exvia-Invia. One 
of the least confirmed objective test 
factors, UI 32 has appeared in only 
three analyses (CaA, CaB, Sc). As 
seen in Table 7, the loadings are gen- 
erally small, and they vary some- 
what from study to study. Never- 
theless, there is some agreement as to 
the relative importance of fluency, 
ego strength, and inaccuracy—char- 
acteristics not infrequently associ- 
ated with extraversion. A further 
link with E-I is provided by Cat- 
tell’s CaB analysis, in which 16 PF 
Factors A, E, F, and H were found to 
correlate with UI 32. However, 
these findings are not supported by 
the more recent Scheier and Cattell 
analysis (Sc), in which only one 
of the questionnaire primaries—A— 
has a sizeable loading on UI 32. As 
Table 7 shows, F and H contribute 
little to the Scheier and Cattell E-I 
factor, and M, the highest-loading 
variable on Cattell’s rating and ques- 
tionnaire factors, has a zero loading; 
M appeared instead on a separate 
Autia factor (.40) and, negligibly, 
on UI 24, Anxiety vs. Dynamic 
Integration. M apparently failed to 
correlate with UI 32 in the CaB anal- 
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TABLE 7 
OBJECTIVE Test Factors: CATTELL’s LABORATORY 


Factor Identification 
Variable CaA CaB Sc 
VII XV Ve 
Fluency on own characteristics — ; 23 45 a 
Fluency on self (vs. others) criticism 
Fluency on dreams 13 21 JIA 
Rate of reading (delayed feedback) : —22! 
Correctly articulated words (delayed feedback, reading) —22 
Correct word rate in reading (delayed feedback) —36A 
Immediate memory for words = 54, 
Myokinetic movement —26 
Objects perceived in unstructured drawings 09 49 — 
Accuracy in gestalt completion —14 —32 = 
Ratio accuracy/accomplishment... l.la aaao a. —22A 
Slanting line errors in CMS —10 —31 29 
Handwriting pressure —31 
High self-estimate of experience in various skills 20 
Self-confidence in untried performance 13 30 lu 
Ego strength: Little shift to successfuls 20 03 30A 
Authority suggestibility —22 06 
Ratio acquaintances/friends —22M 
Preference for familiar (vs. strange) material 24 
Preference for weak (vs. strong) smells —24 
Speed of regularly warned reaction time —42 
Pupil dilation at stress =28 
Increase in heart rate after startle 24 
Systolic blood pressure 25 
C: Free anxiety 28M 
Q: 16 PFA, Cyclothymia 39a 53 
+ Dominance 460 
F, Surgency 460 11 
H, Parmia 430 
M, Autia = 01 
Note Tests from The Objective-Analytic Personalit: 


ions between questionnaire scores and objec’ 


ysis as well:? at least, no coefficient 


is given ina recent report of the study 
(Cattell, 1957b). In view of Cat- 


tell’s insistence that “autia, „M, be- 


longs very definitely with the ‘in- 


troversion’ factors” (1957b, p. 317), 


ion of UI 32 with E-I 
ire clarification, 
Analyses from Eysenck's Labo 

Before turnin 
analyses carrie 


ratory 
g to the objective test 
d out by Eysenck and 

7 Although it had a s 
(.46) on UI 24, accordi 
Scheier (1958). 


ubstantial loading 
ng to Cattell and 


y Test Battery (Cattell, 1956a), 
tive test factor, 


his colleagues, it is neces 
few words about the underlying ra- 
tionale. Eysenck’s research over the 
years has culminated in a rather elab- 
orate theory of extraversion-introver- 
sion (Eysenck, 1957)—essentially a 
Tapprochment of the early views of 
Jung (1923) and McDougall (1926, 
1929), Pavlov’s concept of inhibition, 
and Hull's learning theory. Ey- 
senck’s theory has been criticized re- 
cently for its frequent failure to ac- 
count for data it claims to explain 
(Storms & Sigal, 1958). It does, 
however, have much to recommend 


sary to say a 
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it, one of its chief assets being the ease 
with which it can be operationalized. 
Tests of the theory have been based 
for the most part on comparisons of 
two broadly defined groups of neu- 
rotics, believed to represent the ex- 
tremes of the E-I continuum— 
hysterics, a group consisting of con- 
version hysterics and psychopaths, 
and dysthymics, a combination of anx- 
iety neurotics, depressives, and ob- 
sessionals. The rationale for these 
groupings comes chiefly from Jung 
and McDougall, who regarded hys- 
teria as the characteristic neurosis of 
extraverts, psychasthenia (anxiety, 
depression) as the typical introvert 
disorder. Eysenck added the remain- 
ing categories, and in an early fac- 
torial study (1944), obtained a 
“hysteria-dysthymia” factor which 
seemed to describe the two criterion 
groups. 

It was noted above that analyses 
from Eysenck’s laboratory have gen- 
erally made use of tests selected for 
their relevance to particular person- 
ality dimensions. Since E-I has been 
a major area of interest for Eysenck 
and his co-workers, their analyses 
have generally included tests found— 
or hypothesized—to differentiate be- 
tween hysterics and dysthymics. A 
number of such tests were included 
in Eysenck’s first large-scale objec- 
tive test study (1952), but while sev- 
eral factors emerged, none could be 
identified with E-I. Other analyses, 
however, have yielded factors which 
are at least suggestive of Eysenck’s 
E-I dimension; these factors are 
shown in Table 8. The factors ob- 
tained by Heron (He) and Himmel- 
weit, Desai, and Petrie (Him) have 
been discussed at length elsewhere 
(Eysenck, 1952, 1953) and require 
only brief mention. Appearing on 
these factors are a few tests found 
previously (Eysenck, 1947) to differ- 
entiate between hysterics and dys- 


thymics—tests of persistence, and a 
couple of measures derived from level- 
of-aspiration experiments. Personal 
tempo loads one of the factors 
(Him), but Eysenck, in the publica- 
tion just cited, has shown that his 
two criterion groups do not differ in 
this hypothetical E-I characteristic. 
Other tests supposedly related to E-I 
have negligible loadings on the fac- 
tors; a few—fluency, quick approach 
to timed test (He), speed/accuracy 
ratio (Him)—have no loadings at all. 
In general, then, the relationship of 
the two factors to Eysenck’s E-I 
dimension is not impressively dem- 
onstrated. 

In a more recent analysis (Ey), 
Eysenck obtained an E-I factor de- 
fined by two sociometric measures of 
“sociability” and an index of per- 
formance speed—all theoretically re- 
lated to extraversion, although the 
last one, at least, does not differenti- 
ate hysterics from dysthymics (Him- 
melweit, 1946). Apart from these 
measures, there is little to identify 
the factor with E-I. As Table 8 
shows, the remaining E-I variables? 
have negligible loadings on the fac- 
tor; others, hypothesized as measures 
of E-I, had essentially zero loadings: 
two tests of rigidity, a cognitive 
humor test, an affective discrepancy 
measure related to level-of-aspira- 
tion, and self-rated extraversion. The 
latter measure and teacher-rated ex- 
traversion, which has a loading of .18 
on the factor, were based on rating 
scales adapted from Guilford’s R 
scale—which, as noted previously, 
Eysenck regards as a good measure 
of his dimension! On the whole, then, 
the factor obtained in this analysis 
does not seem entirely consistent 
with E-I as Eysenck defines it. 

The most impressive study coming 

8 Excluding projective test loadings, which 


appear in Table 9 and are discussed in con- 
junction with projective test analyses. 
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TABLE 8 
OBJECTIVE Test Factors: EYSENCK’S LABORATORY 
Factor Identification 
Variable Direction® E He Hil Him 
be I III I 
Measures of E-I . 20 
Porteus Mazes: Starting time Quick 43¢ 
Crossed lines Many i 01 
Lifted pencils any Toi 364 
Wrong directions Many -14 10A 
Track Tracer: Speed High ae 
Accuracy d = 
Accuracy cost pye Low 32 34A 
Personal tempo: Handwriting H 26A 
'Connor Tweezer Test hoe 15 
Rigidity: Alphabet test few. 0 
Humor preference: Sex High 16 
Body build: Sie thie Meanaveres chest diameter Short-round x —18 26 
Sociability I High cs 
Sociability II ig! 
I: Interests Few 28M 
Q: STDCR S, Social Introversion Low 47d 
R, Rhathymia High 56 
R: Extraversion igh 18 
Measures of E-I and Neuroticism 
Persistence: Leg Poor —23 44 07 17M 
and. Poor —01 46 
Breath > Poor 22M 
Level of aspiration: Mean goal discrepancy Low positive 50 
Absolute goal discrepancy Low =24 
Judgment discrepancy High positive 17 
Index of flexibility High —01 50 
Measures of Neuroticism 
Crown Word Connection List: “Neurotic” score | High 33 
Track Tracer: Performance under stress Poor ý -14 
Q connor Tweezer este Pyenhess of improvement! Foor 52M 
ody sway suggestibility: Total sway i 
ci Reversals s ha ‘ail 95, 13M 
tatic ataxia: Total sw: i 
D Reversals ey, High —08M 18 11M 
ark vision 07 
Systolic blood pressure® Boas ~13 28 
Diastolic blood pressure® ~23 
ulse rate after stress® ~12 
ublingual temperature® 20 
inger temperature® 30 
I: Annoyances Ma 
Sima ydsley Medical Questionnaire (MMQ) High —20M | -oM zá 
MMO Lie Scale Hish o! ~26M 
C: Mental health Pear =a —37A 
: Neuroticism igh 17 30 
f Unclassified Measures 
erseveration: S-Z-SZ 
7 and reversed tow 27 
Strength of grip (hand dynamometer) Strong a 
I: Food aversio: 
Zygoticity® ` 07S Many 42 
22 
Note.—Variab] i i 
intelligence iables with no loadings >.10 om 


itted from table; among them, 
or unclassified measures, 
(first two groups of variables; 
aingetion as listed here, 
i 


Several tests of E-I (see text) and 


indicates scorin, indicates pr 


coring direction. For others, 
) or neuroticis i 


jon 
r edicted direction for extraversio 
bles) or ne ing titd group). Variables reflected octe necessary, to agree Wit 
tive test londi » Dositive loadings Support prediction, negative ones do not. 
oP djustmmen gains for this factor appear in Table 9: g 
e Differentiates between 49 pained in this analysis, 


i Pew terics and 45 dyst] 
irectional predictie iat = 
í Modifed on not stated, 


r children, 
E Coded as follows: monozygotic, 


hymics at .05 or better (see text). 


1; dizygotic, 2, 


from Eysenck’s laboratory is the 


one reported recently by Hildebrand 
(Hil). Hildebrand tested 25 male 
normal subjects and a large group 


of male neurotics, including 45 hys- 
terics (25 conversion hysterics, 20 
Psychopaths), 45 dysthymics (25 
anxiety states, 10 depressives, 10 ob- 
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sessionals), and 55 cases with mixed 
symptomatology. In accord with Ey- 
senck’s theory, the conversion hys- 
terics and anxiety states were re- 
served as E-I criterion groups; these 
two groups, together with the normal 
subjects, constituted criterion groups 
for neuroticism. A factor analysis 
was then carried out, using intercor- 
relations based on the remaining 95 
subjects. Rotational criteria are not 
described in Hildebrand’s article, but 
in a personal communication,’ he 
indicates that Factor III, Extraver- 
sion-Introversion, was rotated to 
Guilford’s R scale. 

As Table 8 shows, all of the pre- 
dicted E-I loadings on Hildebrand’s 
factor are in the expected direction; 
some, however, are extremely small. 
Sizeable loadings on Guilford’s R and 
S scales identify the factor with the 
previously discussed STDCR extra- 
version factors, but a question might 
still be raised about its relationship 
to Eysenck’s dimension. 

The question is answered by the 
second part of Hildebrand’s analysis, 
in which E-I factor score comparisons 
were made for the various groups of 
subjects. Normal subjects were 
found to be the most extraverted, 
followed by hysterics, mixed neu- 
rotics, and dysthymics, in that order. 
Significant differences were obtained 
between conversion hysterics and 
anxiety states, and between the larger 
hysteric and dysthymic groups as 
well. 

The results of these comparisons 
demonstrate convincingly the rela- 
tionship of Hildebrand’s E-I factor to 
Eysenck’s dimension; they likewise 
seem to lend impressive support to 
Eysenck’s theory. Not to be over- 
looked, however, are some important 
problems in the study itself and in 


9 Hildebrand, H. P., Personal communica- 
tion, March 4, 1959. 


the interpretation of the results. In 
the latter category, the greater ex- 
traversion of the normal group pre- 
sents some difficulties, although, as 
Hildebrand suggests, it may simply 
reflect an unfortunate choice of con- 
trol subjects. However, there is con- 
siderable evidence to indicate that 
conversion hysterics, at least, are no 
more extraverted than unselected 
normal subjects; they have con- 
sistently been found to score at or 
below the “normal” mean on E-I 
questionnaires (Eysenck, 1959; Sigal, 
Star, & Franks, 1958; Storms & 
Sigal, 1958). Calling attention to this 
finding, Eysenck (1959) notes that 
it “is not quite in line with expecta- 
tion, but has been repeated on several 
samples and must be accepted” 
(p. 6). 

Concerning the analysis itself, 
Storms and Sigal (1958) found from 
Hildebrand’s original data (1953) 
that the groups pooled for the factor 
analysis differed significantly in varj- 
ance on some of the tests. Discrim- 
inant functions computed for Hilde- 
brand’s data by Storms (1958) dis- 
tinguished between conversion hys- 
terics and anxiety states better than 
the factor scores, yet showed hys- 
terics and psychopaths to be the most 
widely separated groups in terms of 
test performance. Hamilton (1957) 
has called attention to a similar lack 
of homogeneity within the dysthy- 
mic group, noting that on a dozen or 
so measures used in Hildebrand’s 
study, either anxiety states or obses- 
sionals performed more similarly to 
hysterics than to the other dysthymic 
subgroups. It is hard to tell whether 
these various inconsistencies can be 
attributed to the particular tests 
used, or whether they inhere in the 
criterion groups themselves. What 
evidence is available, however, seems 
to favor the latter explanation. In 
his early studies of hysterics and 
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dysthymics, Eysenck (1947) repeat- 
edly found larger standard devia- 
tions for the dysthymic group, lead- 
ing him to suggest the possibility 
that “the dysthymic group con- 
tains several distinct subgroups” (p. 
251). Moreover, data from a later 
study (Eysenck, 1952) showed no 
significant differences between hys- 
terics, psychopaths, and anxiety 
States on tests of persistence, speed, 
accuracy, goal discrepancy, judgment 
discrepancy—all tests found previ- 
ously to differentiate hysterics from 
dysthymics (Eysenck, 1947). 

While these findings cast some 
doubt on the validity of the hysteric- 
dysthymic dichotomy, the fact re- 
mains that the two criterion groups 
are significantly differentiated by 
the major variables defining Hilde- 
brand’s factor (see Table 8) and by 
their E-I factor scores, Hildebrand’s 
analysis thus establishes an impor- 
tant link between Eysenck’s concep- 
tion of E-I and the questionnaire 
factors defined by the Guilford scales, 

Finally, brief mention should be 
made here of Becker's analysis (Be), 
discussed Previously in connection 
with analyses of the Guilford and 
Cattell questionnaires. It will be re- 
called that Becker's E-I factor ap- 
peared to resemble the Lack of Self- 
Control factor obtained by Mann; 


its relationship to Eysenck’s dimen- 
Sion is indica 


), and basal 
ing, anisej- 
konic lens tests, flicker fusion). Of 


the 32 variables derived from these 
tests, not one had a loading as great 
as .35 on the E-I factor. The only 
crucial variable to load over .30 was 
a kinesthetic aftereffect decrement 
score, which appeared to reflect little 
more than noncrucial differences in 
baseline, and which proved to have a 
retest reliability of zero. Thus, while 
the questionnaire loadings on Beck- 
er’s factor readily identify it with 
Eysenck’s concept of extraversion, it 
does not lend impressive support to 
more recent extensions of the con- 
cept. 


Analyses of Projective Tests 


The search for projective test 
counterparts of E-I has focused on 
the Rorschach test—the most widely 
studied projective instrument, and 
the only one linked by theory with 
the E-I dimension. It might be men- 
tioned, however, that Sirota (1957) 
has identified an extraversion-like 
factor in another projective instru- 
ment—the Psychoanalytically-ori- 
ented Blacky ‘test (Blum, 1950). 
Sirota’s factor, Impulse Expression 
vs. Impulse Control, may be related 
to the Previously discussed “malad- 
Justed extraversion” factors, but at 
Present no empirical comparisons can 
be made. 

The theoretical link between E-I 
and the Rorschach test is provided 
by Rorschach’s concept of experience 
balance, expressed as the ratio of 
human movement (M) to color (Sum 
C) responses given to the Rorschach 
inkblots. Extratensive subjects, with 
a ratio favoring color, are said to be 
outwardly oriented, by virtue of 
their responsiveness to objective re- 
ality, i.e., color stimuli present in the 
blots. The Perception of movement, 
on the other hand, has no correspond- 
ing external reality, and thus requires 
an intervening subjective process. 
Consequently, introversive subjects, 
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with a preponderance of movement 
responses, are described as having a 
more active ‘‘inner life” and less con- 
cern with external, objective reality. 

While Rorschach (1951) denied 
any relationship between his experi- 
ence balance concept and Jung’s 
extraversion-introversion, the two 
viewpoints seem to have much in 
common. Rorschach’s distinction be- 
tween objective and subjective ori- 
entation is the crux of Jung’s theory, 
and descriptions of the two Ror- 
schach “experience types’’ are re- 
markably like Jung's characteriza- 
tions of the extravert and introvert. 
Moreover, evidence from several 
studies indicates that some of the em- 
pirically found differences between 
extratensive and introversive sub- 
jects correspond to hypothesized or 
observed differences between extra- 
verts and introverts (Bash, 1955; 
Bieri & Messerley, 1957; Mann, 
1956; Palmer, 1957; Singer & Spohn, 
1954). 


Analyses of the Rorschach Test 


Several Rorschach analyses have 
produced factors which appear to be 
related to experience balance, and 
which also have loadings on some 
non-Rorschach measures suggestive 
of E-I. The relevant factors are 
shown in Table 9, as are the projec- 
tive test loadings for Eysenck’s E-I 
factor (Ey), discussed above. 

Eysenck included a number of 
Rorschach variables in his analysis, 
and obtained from a Rorschach ‘‘ex- 
pert” opinions concerning their rele- 
vance to E-I. As Table 9 shows, 
Eysenck’s extraversion has loadings 
on Rorschach D, FM:M, F%, and 
P; introversion is defined chiefly by 
M% and a composite pathology 
score. Expert opinion concurred with 
all but the F% loading, and Eysenck 
concludes that, on the whole, his re- 
sults support the hypothesized rela- 


tionship between E-I and Rorschach’s 
extratension and introversion. It 
will be noted, however, that Ey- 
senck’s analysis included no color 
variables; his results thus say nothing 
about a relationship between extra- 
version and extratension. Nor can 
such a relationship be inferred from 
the fact that M% appears at the 
introvert pole, for Rorschach factors 
defined by M are not necessarily re- 
lated to experience balance, as will 
be seen presently. 

More pertinent to the experience 
balance question are two of the 
factors obtained in Singer, Wilensky, 
and McCraven’s analysis (Si). Fac- 
tor III, Emotional Surgency, com- 
bines positive loadings on the Ror- 
schach color determinants with a 
small negative M loading; Factor IV, 
Introspectiveness, has substantial 
positive loadings on M and on a re- 
lated measure, movement threshold 
(Barron, 1955). These factors, in 
turn, have loadings of .50 and —.32, 
respectively, on the first of two sec- 
ond-order factors reported by the 
authors, and thus seem to reflect a 
bipolar dimension of some sort. The 
movement-color contrast suggests 
that the dimension may be experience 
balance, and that Emotional Sur- 
gency and Introspectiveness Cor- 
respond to Rorschach’s extratension 
and introversion. 

Concerning the relationship of the 
two factors to E-I, it might be noted 
that the ratings which appear on 
Factor III reflect a responsiveness to 
the environment (albeit a negative 
one!) which might suggest extraver- 
sion; likewise, acquiescence to au- 
thority and general disinterest in ex- 
ternal events, associated with Fac- 
tor IV, do not seem inconsistent with 
introversion. Moreover, the small 
level-of-aspiration loadings agree 
with Eysenck’s (1947) findings fo 
hysterics and dysthymics. Addi- 
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TABLE 9 
PRojectIVE Test Factors 
Factor Identification 
Variable E FoA FoB Si Wi, 
a m)t|mlwiyvlaf/myiw| 1P 
Barron Movement Threshold 56 | 01 y 
TAT Transcendence Index —03 | 42 
Rorschach: * F Å 
W 63 |-33 |—07 f 
Wo 32 | 02 |—47 |-26 | 35 
Wo 00 | 43 |—66 |—02 | ‘<r i 
D 51 4 oi 06, 48 
M 14 |-4s | 75 x 81 |—27 | 65| 26, 36 
FM 35 |—56 |—53 |—02 |~07 | 52 | 52 33, 39 
Fo so | 22 |-75 
c =19 | 57 |—14 | 09, 26 
FC 07 | 56] 15| 65 48 
CF ` 02 | 37| o6 | o0 —12 
tal 20, —06 
Pd 24 04, 35 
R a1 | 28] 34 O7 | 53) 44] 27° 44 
Rorsch: ios, %, etc.: 
Fam ates % E 14 |~68 | 70 22 |—o8 | 14 
Special F-+-% (highly 
articulated responses) 18 |-24 | 86 
um shading 60 | 21] 05 
šum C 69 | so} oo 
M: Sum C 36 |-53 | o8 |—09 
T: Weghaler- Bellevue 
cale - 
Verbal 16 C 08 1-81 | 06 |~06 |~s1 | o0 ao | OF} 19 [12 00, —08 
Vocabulary 44 |-62 | 07 |~11 1-63 | og 01 
Digit Span Scatter —02 |—47 |—05 |~29 
Wechsler Yumber-Square 
nitial performance 
Level assets —18 18 |—07 
Porteus Mazes 23 |—27 15 
Authority Reaction Test 48 |-07 |-13 
Motor inhibition OL} 11| 57 
Time estimation 69| 00| 12 
Digit Frustration 17 | 28| 02 
: Anxiety 65 | 24 |-14 24| 05] 60 
+ Aggressiveness 
‘ooperativencss =i 44 |—06 
Interest level i He E Si 
iffuse energy level Ss 
Planfulness —04 | 35 |—21 
MMPI 02 | 07 |~13 
L, Lie Scale 
P, Validity Scale 
K, uppressor Scale 
Bes Hypochondriasis 
pression kei 
ty Hut an ; 721-09 | 13 |—43 
Pa Bara cpathic Deviate 32 25 13| 44 
Ps, Psychasthenia 98 |703 | of |207 
So Schizophrenia si |—o1 el 33 
a, i; 
o ARR 14 | 27 |-18 | 37 
5 €] ii 
Bernreuter A 
F1-C, Confidenced 
F2-5, Sociabilityd T80 | 24 |-01 | 26 
T: Allport-Vernon =24| 66 | 15] 00 
heoretical 
‘conomic 27 |-64 | 16] 22 
esthetic =a 10 |-52 | 26 
Political 37 |-27 | 42 |—26 
=10 | 67 |-24 44 


bc ages pune TES Pt score, —40; PYG, 29; a Ad 


ain respectively” 35; F, —20, 37; Fe, 17, 12; K, 16, —07; c, =01;.17; 
rection indicated by scale title. 


d to agree with 


tional i p be sought in tially an index of accuracy, and it 
Porteus Maze per ormance; the ought to be closely related to the 
quantitative score used here is egg 


en- component variables—crossed lines, 


t 


ay 
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wrong directions, etc.—previously 
associated with extraversion (Table 
8). As Table 9 shows, however, the 
maze score contributes to neither 
factor above; it is found instead on 
Factor I, Motor Inhibition. This 
factor, like Introspectiveness, is sug- 
gestive of introversion; yet, despite 
some important similarities, the two 
factors are negatively correlated 
(—.15). Moreover, Motor Inhibition 
appears on the other second-order 
factor, which has zero loadings for 
Emotional Surgency and Introspec- 
tiveness. It would seem, then, that 
in addition to a pair of factors cor- 
responding to experience balance, 
Singer et al. have uncovered a sec- 
ond, independent ‘‘introversion”’ fac- 
tor in the Rorschach test. The latter 
factor, though unrelated to experi- 
ence balance, seems as reasonable a 
match for E-I. 

Singer et al. consider the three 
factors just discussed to be similar 
to Thurstone’s Reflectiveness and 
Impulsivity factors (Table 3), and to 
the Emotional Drive and Inhibition 
factors obtained in two analyses by 
Foster (Table 9). Thurstone’s factors 
can be related to Singer’s only by in- 
ference, but in the case of Foster’s 
factors, some direct comparisons can 
be made. 

Factor I in Foster’s first analysis 
(FoA) is called Emotional Drive; it 
shares with Singer’s Surgency high 
loadings on Rorschach R and the 
color determinants (Sum C here). 
Factor III, Delay and Inhibition, ap- 
pears most similar to Singer’s Intro- 
spectiveness, although an important 
discrepancy is seen in the F+ load- 
ings. 

In his second study, Foster used a 
modified Rorschach procedure to 
control for differential responsivity. 
Subjects were instructed to give at 
least three responses to each of the 
first nine cards, and at least six to 


Card X; the analysis was based on 
the required minimum (33 responses) 
for each subject. As can be seen in 
Table 9, the factors obtained in this 
analysis (FoB) are quite unlike those 
in the first study. Factor I, Delay 
and Inhibition, has a small M: Sum 
C loading, but the unusually high 
loadings on MMPI Pt and Sc—and 
on the Bernreuter Fl-C scale—mark 
it as a probable maladjustment fac- 
tor. Factor IV, Emotional Drive, re- 
sembles its FoA counterpart in W, 
although the absence of the color 
component here argues against the 
identity of the two factors. 

It is curious that in the search for 
factors comparable to their own, 
Singer and his associates overlooked 
the second factor in Foster’s two 
analyses. These two factors have 
important loadings on Vocabulary 
and Verbal IQ, and Foster describes 
them as Verbal Adjustment factors. 
However, the prominent loadings on 
Rorschach M and Sum C (or M: Sum 
C) indicate that the two factors are 
closely allied with the experience bal- 
ance concept; they seem to parallel: 
the second-order experience balance 
factor obtained by Singer, Wilensky, 
and McCraven (1956). Some other 
variables appearing on the two fac- 
tors (particularly the FoB factor) 
suggest a relationship to E-I; Bern- 
reuter F2-S, MMPI Ma, perhaps the 
Allport-Vernon Political scale, which 
seems to be linked with extraversion 
(Eysenck, 1954). Even the strong 
verbal component might be looked 
upon as favorable evidence; Himmel- 
weit (1945) has shown that dysthym- 
ics do better on vocabulary tests than 
on nonverbal measures of intelli- 
gence, whereas the reverse is true for 
hysterics. 

The foregoing studies seem to sup- 
port the validity of the experience 
balance concept, and they at least 
hint at a relationship between this 
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concept and E-I. On both points, 
however, there is equally impressive 
evidence to the contrary. Several ex- 
tensive Rorschach analyses have 
failed to produce anything resem- 
bling an experience balance factor 
(Borgatta & Eschenbach, 1955; Kar- 
son & Pool, 1957a; Wittenborn, 1950a, 
1950b), and inasmuch as these analy- 
ses do not seem to differ from the 
preceding ones in any consistent way, 
the discrepant results leave some 
doubt about the dimensionality of 
experience balance. In a similar vein, 
Williams and Lawrence (1953) ob- 
tained two factors with small con- 
trasting loadings on M and Ge in 
both cases, FC appeared at the ‘‘in- 
trovert” (i.e, M) extreme, and CF 
had no loading at all. Even the ap- 
pearance in Singer, Wilensky, and 
McCraven’s analysis of an “Sntrover- 
sion” factor unrelated to experience 
balance muddies the waters consid- 
erably, although it does not, of 
course, rule out a relationship be- 
tween E-I and experience balance, 
Finally, there is the evidence from 
comparisons of Rorschach measures 
with the various multidimensional 
questionnaires, below. 


Joint Analyses: Rorschach and 
Questionnaire M; easures 


In an early attempt to demonstrate 
a relationship between E-I and ex- 
perience balance, Thornton and Guil- 
ford (Tho) correlat: 
ures from th 


Guilford’s factors S, E, M, R, and T. 
They obtained no 


C. , Royal (Ro) undertook 
a similar task, using the S, T, and 
R scales from the STDCR i 


He was unable to find a 
nificant correlation between the three 
scales and a dozen potential Ror- 
schach indices of E-I, including M, 
Sum C, and M: Sum C. 


inventory, 
single sig- 


Similar results have been obtained 
with the MMPI. Palmer (1956) re- 
ports that chi square comparisons of 
the MMPI scores for 30 extratensive 
and 30 introversive subjects indi- 
cated no relationship between experi- 
ence type and Si scores; other dif- 
ferences were ‘so few ‘as to be 
of doubtful significance” (p. 208). 
Williams and Lawrence in a joint 
factor analysis (Wi) of the Rorschach 
and MMPI, obtained an “expressive- 
repressive” factor, shown in Table 9. 
The MMPI loadings on this factor 
are reminiscent of the E-I factor ob- 
tained by Kassebaum, Couch, and 
Slater. The Rorschach loadings, 
however, certainly do not correspond 
to experience balance. Foster’s sec- 
ond analysis (FoB) is also relevant 
here. Of the three factors discussed 
above, only Factor I, Delay and In- 
hibition, has any sizeable MMPI 
loadings. The appearance of MMPI 

on the factor is consistent with in- 
troversion, but equally so with mal- 
adjustment—a more reasonable in- 
terpretation in terms of the other 
loadings. Factor V in Foster’s analy- 
sis, (Hypo) Manic-Depression, has 
not been mentioned Previously, This 
factor, shown in Table 9, resembles 
the MMPI E-I factors ‘considered 
earlier, but it has no important 
Rorschach loadings, 

The projective test studies re- 
viewed here attest to the unreliabil- 
ity of “apparent similarity” as a 
basis for matching factors. Some 
of the Rorschach factors appear to 
reflect certain characteristics as50- 
ciated with E-I; the identification is 
strengthened by occasional small 
loadings on E-I variables from other 
media. None of the evidence is very 
impressive, however, and the results 
of the joint analyses just discussed 
indicate that E-I questionnaire fac- 
tors, at least, have little in common 
with the extraversion-like factors 
obtained from the Rorschach test. 


„what uncertain. 
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EVALUATION 


To what extent has the nature of 
extraversion-introversion been clari- 
fied by recent multivariate research? 
What more—if anything—can be 
said about the unidimensionality of 
the construct, or its relationship to 
adjustment, on the basis of this re- 
search? These questions can perhaps 
be answered best by summing up the 
evidence in terms of the criteria set 
forth at the outset. 


Extraversion-Introversion and 
Unidimensionality 

The foregoing analyses indicate 
that it is possible to identify in all 
extensively studied measures and 
media at least one factor which 
bears some resemblance to tradi- 
tional conceptions of E-I. The favor- 
able results of early rating studies 
find confirmation in Cattell’s dis- 
covery of an E-I factor in data from 
behavioral observation. Clear-cut 
factors have likewise emerged from 
analyses of various multidimensional 
questionnaires. Objective test bat- 
teries have in most cases yielded 
factors suggestive of E-I; in general, 
however, the factor loadings have 
been small, and interpretations some 
In the realm of 
projective tests, an extraversion-like 
factor has been found in the Blacky 
test, and factors identifiable with 
Rorschach’s experience balance have 
appeared sporadically in analyses of 
the Rorschach test; the latter factors 
are linked by theory, at least, with 
E-I. In the various media, then, the 
situation remains essentially as Ey- 
senck found it in 1953, with well- 
defined E-I factors appearing in 
questionnaire and rating studies, sug- 
gestive ones in analyses of objective 
and projective tests. True, a great 
deal more evidence has accumulated, 
particularly in the questionnaire 
medium, and much of it is favorable. 
Nevertheless, in terms of the first 


criterion—the consistent appearance of 
E-I factors in all media of observa- 
tion—the unidimensionality of extra- 
version-introversion has not been 
conclusively demonstrated. 

In terms of the second criterion— 
the interrelatedness of the obtained 
factors—the evidence is meager. No 
empirical comparisons have been re- 
ported for the objective or projective 
test factors obtained by different in- 
vestigators; similarities have been 
noted in some cases, but the diversity 
of the variables, procedures, and 
populations represented in these 
studies makes speculation hazardous. 
Evidence from questionnaire studies 
shows that, in general, repeated 
analyses of the same instrument 
yield similar-appearing factors which, 
on the basis of ‘psychological mean- 
ing,” can be identified with E-I. 
Such factors have been found in the 
questionnaires of Guilford and Cat- 
tell, and in the MMPI. Factor load- 
ings vary from study to study, and 
variables are sometimes added or 
dropped, but there remains in each 
of the questionnaires a “core” of 
variables which appear consistently 
on E-I factors, regardless of the 
population studied, or the factorial 
procedure employed. Moreover, evi- 
dence from several studies shows that 
the core variables from the various 
questionnaires are at least moder- 
ately interrelated. Weighing against 
these very favorable findings, how- 
ever, are the results of several joint 
analyses of the Guilford and Cattell 
questionnaires, showing that at 
least two independent factors are re- 
quired to account for the inter- 
correlations between the E-I vari- 
ables. 

Little information is available con- 
cerning the relationships between 
E-I factors from different media. 
Cattell’s rating and questionnaire 
factors appear similar, and a few of 
the questionnaire variables are re- 
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lated to his objective test E-I factor, 
although inconsistently. Objective 
test factors from Eysenck’s labora- 
tory are linked by one study with the 
Guilford questionnaire factors, by 
another—though less certainly—with 
some of the Rorschach variables. On 
the other hand, joint analyses of 
the Rorschach test and various ques- 
tionnaires suggest that the extra- 
version-like factors from these in- 
struments are probably unrelated. 
It appears, then, that despite an 
impressive accumulation of relevant 
multivariate research, thé unidimen- 
sionality of extraversion-introversion 


has not been unequivocally demon- 
strated. 


Extraversion-Introversion and 
Adjustment 


Except for the projective test 
analyses, where what constitutes 
an “adjustment” factor is not readily 
ascertained, virtually every analysis 
which has Produced an extraversion- 
like factor has also yielded a factor 
identifiable with some aspect of ad- 
justment. The latter factors, known 
variously as ego strength, general 
adjustment, neuroticism, anxiety, 
etc., appear to be essentially inde- 
pendent of E-I. The independence 
resulting from orthogonal rotation, 


nnaire factors of 
for example, cor- 
hus, according to the 
lated factors, ex- 


In many cases, h 


been noted that the E-I factors seem 


of adjust- 
actor pat- 
shows that 
at least a 
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tors from the same analyses. Look- 
ing at the questionnaire factors, it 
can be seen further that in analyses 
which have yielded a single E-I fac- 
tor, the shared variables tend to 
align with that factor in such a way 
that “good” adjustment is asso- 
ciated with extraversion, “poor” ad- 
justment with introversion. The 
tendency is most apparent in anal- 
yses of the Guilford and Cattell 
questionnaires (Table 2); in only one 
instance (Factor C in Analysis De) is 
an important exiraversion variable 
linked with maladjustment. As might 
be expected, the tendency is less 
pronounced in the case of the MMPI 
factors (Table 4), where many of the 
variables are intrinsically related to 
maladjustment. Nevertheless, only 
two conspicuous exceptions are found 
—the Ma and Pg scales, which tend 
to be related to both extraversion 
and maladjustment. It is doubtful 
whether Pg should be counted; the 
Pa loadings on E-I factors are some- 
what inconsistent, And while the 
Ma scale does appear consistently at 
the “extravert” extreme, there iS 
some evidence that it may be related 
only to maladjusted extraversion. In 
the case of questionnaire analyses 
yielding more than one extraversion- 
like factor, there are some indica- 
tions that adjustment may be in- 
volved in the split, It has been 
noted in connection with these anal- 
yses that one of the factors generally 
bears some resemblance to “well- 
adjusted” extraversion, while an- 
other appears to reflect maladjusted 
extraversion. It has also been noted 
that such Pairs of factors share few 
E-I variables, and thus seem to repre- 
sent qualitatively different dimen- 
sions, 

Turning to the factors from other 
media, it should be mentioned that 
None of the variables defining Cat- 
tell’s E-I rating factor have loadings 
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as great as .30 on his second-order 
anxiety factor. However, the ex- 
traversion factor has a substantial 
negative loading on M, Autia, and 
if this primary rating factor is a true 
counterpart of questionnaire factor 
M, there is reason to suspect that in- 
troversion and maladjustment may 
be confounded in the E-I rating 
factor. By the same token, the ap- 
parent absence of M from Cattell’s 
objective test factor UI 32 favors 
the independence of the latter factor 
from adjustment. Indeed, it can be 
seen in Table 7 that the few ‘‘adjust- 
ment” variables which appear on 
UI 32 are about evenly divided be- 
tween the two poles of the factor. 
Unfortunately, the absence of M also 
raises a question about the relation- 
ship of UI 32 to Cattell’s E-I factors 
in other media. Among the objective 
test analyses represented in Table 8, 
Eysenck’s study yielded no adjust- 
ment factor, but his E-I factor links 
introversion with “pathology” as 
reflected in the Rorschach test. The 
remaining factors shown in Table 8 
are similar to Cattell’s UI 32 in 
the division of “adjustment” vari- 
ables. As was the case with UI 32, 
however, the identification of some 
of these factors with E-I might be 
questioned. 

If it is asked, then, whether extra- 
version-introversion and adjustment 
are independent, in the sense that 
variables reflecting “good” and “poor” 
adjustment are as frequently associated 
with extraversion as with introversion, 
a clear-cut answer cannot be given. 
It is evident that many of the ques- 
tionnaire factors do not meet this 
second criterion, and for most of the 
factors which are independent in this 
sense, there is some doubt about their 
relationship to E-I. 


CONCLUDING REMARKS 


The present review was prompted 


by the recent burgeoning of interest 
in extraversion-introversion, and by 
the fact that current assumptions 
about the unidimensionality of the 
construct, and its independence from 
adjustment, cannot be justified in 
terms of the research covered by the 
last comprehensive review (Eysenck, 
1953). An examination of more re- 
cent research has shown the evi- 
dence on both issues to be equivocal, 
and the status of extraversion-in- 
troversion as a dimension of per- 
sonality thus remains somewhat 
tenuous. 

—Tn concluding, it is well to point 
out what appear to be the major im- 
plications of the research reviewed 
here. First, the ‘“‘nomological net- 
work” developing from Eysenck’s 
earlier review has begun to be tied 
down to observable data—a repli- 
cated factor here, a series of inter- 
correlations there—and, while a 
great many gaps remain, there is rea- 
son to believe that further research 
along these lines will not be wasted. 
Second, the most profitable directions 
for such research seem to be clearly 
indicated. There are variables whose 
relationships to extraversion and ad- 
justment need to be clarified. There 
are factors whose widely differing 
patterns across studies need to be 
accounted for. There are areas 
which have not been—or are just 
beginning to be—systematically ex- 
plored. There are hints that extra- 
version-introversion may be differ- 
entially manifested in males and fe- 
males, and in well-adjusted and mal- 
adjusted individuals; both possibili- 
ties need to be followed up. Finally, 
and perhaps most important, there is 
a need for broadly conceived analyses 
oriented toward extraversion-intro- 
version and its relationship to ad- 
justment. Such analyses would 
necessarily include a wide array of 
variables from all media—variables 
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selected for their relevance to the 
two dimensions, and, when possible, 
variables of known factorial com- 
position, so that the resulting factors 
could be compared empirically with 
previously discovered ones. Until 
such further steps are taken, the 
issues raised here are not likely to be 
resolved. 

In the meantime, a word of cau- 


tinue in psychological usage—and, 
judging from past history, there is 
little likelihood that it will not—care 
must be taken to specify its concep- 
tual and operational referent. What 
appear to be minor distinctions be- 
tween the various conceptions may 
in fact be crucial ones; to discard 
them too hastily is likely only to 
Propagate the illusion of a unity not 


tion seems in order. If the term yet established. 
extraversion-introversion is to con- 
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This report is a review of published 
psychological studies which involve a 


» comparison of Negroes and whites in 


the United States during the period in 
the main from 1943-1958. The pe- 
riod chosen covers work reported fol- 
lowing Klineberg’s (1944) review. 
Where called for, for completeness, we 
have selected studies from previous 
years. Most of the literature to 
which reference is made reports direct 
comparisons between Negroes and 
white groups. However, in some in- 
stances direct comparisons were not 
made, but inferences could be drawn 
—e.g., where a test standardized ona 
white population has been admin- 
istered to a Negro group. For the 
purpose of inclusion in this review, 
research has been considered relevant 
wherever the individual authors have 
stated that Negro-white comparisons 
were made or wherever the stated 
population in studies utilizing white- 
standardized tests is Negro. Thus 
the populations compared have con- 
sisted of varying degrees of “racial 
pureness.”” We have reached into 
studies of physical development and 
sociological research where we have 
deemed it important for psychologi- 
cal completeness. 

Two earlier reviews of racial psy- 
chology (primarily concerned with 
mental differences) have appeared 
in this Journal (Garth, 1925; Wood- 
worth, 1916). For comprehensive 
discussions of the Negro and his 
problems the reade is referred to 
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Myrdal (1944), Klineberg (1944) 
and Canady (1946). 

In addition to these references, 
racial differences are discussed in 
many general works, some prior to 
our period (Anastasi & Foley, 1949; 
Bendix & Lipset, 1953; Benedict & 
Weltfish, 1943; Boyd, 1950; Dunn & 
Dobzhansky, 1946; Frazier, 1957; 
Garth, 1931; Ginzberg, 1956; Kar- 
diner & Ovesey, 1951; Knox, 1945, 
1949, 1952; Lindzey, 1954; Montagu, 
1952; Sarason & Gladwin, 1958; Ty- 
ler, 1956). We have endeavored in 
the following review to confine our- 
selves to the experimental literature; 
only in the sections on values and at- 
titudes and on emotional disturb- 
ances have we departed to some ex- 
tent from the rule. Over 200 addi- 
tional references were reviewed, but 
not included among our references, 
either because they have been ade- 
quately covered in other reviews, or 
they lacked significance or relevancy. 
Less than a dozen possibly significant 
theses were abstracted in various 
places, but were not studied by the 
writers. 


PHYSICAL AND MOTOR 
DEVELOPMENT 


Physical Status of Infants and 
Children 


Gestation, birth, and early physi- 
cal development may have decisive 
influence upon later psychological 
functioning but are of importance in 
themselves. Even later infancy is 
measured psychologically in terms 
of fine and gross muscle coordina- 
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tions. Accordingly, we summarize 
briefly studies of physical and motor 
development of children. 

Brown, Lyon, and Anderson (1945) 
report a significantly higher rate of 
prematurity among Negro infants 
than among whites if criterion of 
birth weight is used as a standard. 
However, if allowance is made for the 
generally known fact that Negro in- 
fants weigh less at birth (for what- 
ever reasons), the difference in pre- 
maturity rates disappears. 

White infants are on the average 
longer than Negro infants if the re- 
sults of a 1943 survey (Meredith, 
1943) of North American research 
still hold. The possibility is that some 
change occurs as socioeconomic dif- 
ferences are reduced. At least, such 
an inference could be made from 
studies conducted in which socioeco- 
nomic levels have been held constant 
(Rhoads, Rapoport, Kennedy, & 
Stokes, 1941; Scott, Cardozo, Smith 
& DeLilly, 1950). ‘Rhoads et al’ 

(1941) followed white and Negro 
children from four months to four 
years in the outpatient department 
of a children’s hospital. From the 
same lower socioeconomic groups, 
Negro infants actually tended to be 
taller from about nine months of age. 
Body weight and head circumfer- 
ence, eruption of teeth and 
velopment of bo 


2 year of life from 
the lower middle Class were com- 


pared with those of white infants 
from comparable socioeconomic ley- 
els. Michelson (1943) concluded that 
the weight patterns are very similar 
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when individuals are placed on the 
same dietary regime. 


Psychomotor Development 


Tests of infant intelligence have 
been suspect (possibly unjustifiably) 
for many years as predictors of sub- 
sequent intellectual performance. 
Nevertheless, as measures of com- 
parative psychomotor developmental 
level, they perform a useful task. ; 

The most widely known study is 
that of McGraw (1931) which was 
reported prior to the period covered 
by this review. In a comparison of 
Negro and white infants in a South- 
ern community on the Buhler Baby 
Tests, McGraw discovered that white 
babies exceeded Negro babies in per- 
formance. The former were also su- 
Perior in height and weight. A 

Pasamanick (1946), contending 
that McGraw was measuring differ- 
ences in opportunities for adequate 
care and feeding rather than native 
abilities, endeavored to overcome ob- 
jections to the previous work by 
utilizing New Haven infants some- 
what comparable in weight an 
height factors, Comparison on an 
infant development scale suggested 
that differences are not- between 
groups so much as within racial 
groups. Negro babies proved to be 
somewhat accelerated relatively in 
gross motor behavior. 

Inadequacies of sampling, as well 
as the more general difficulty of esti- 
mating skin color subjectively, tend 
to invalidate Pasamanick’s conclu- 
sions. His infants may very well have 
been comparable for other reasons 
than the quality of diet during and 
after pregnancy to which Pasamanick 
attributes his results. 

In a follow-up study utilizing the 
same Negro infants Pasamanick and 
Knoblock (1955) determined that de- 
pressing influences on the Develop- 
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mental Quotient of exogenous factors 
were not important up to at least two 
and one half years of age. Whatever 
methodological difficulties are sug- 
gested by the original study, the fact 
that the group of Negro infants main- 
tained their DQs for so long a period 
of time indicates that they were 
maintaining their relative develop- 
mental standing, including intellec- 
tual functioning. 

What appears to be better-con- 
trolled research by Gilliland (1951) 
employed the Northwestern Infant 
Intelligence Scale rather than the 
Gesell scales (understandably, since 
Gilliland developed the former). In 
three separate studies, at least one 
of which controlled for major vari- 
ables, Negro infants in Chicago had 
1Qs as high as or higher than white 
infants. The results may, the author 
suggests, be interpreted as manifesta- 
tions of the greater maturity in 
motor behavior Negro babies are 
known to have, or as a resultant of 
crowded quarters in which Negro 
babies receive greater social stimula- 
tion. Inasmuch as the former ex- 
planation does not seem to be found 
when socioeconomic variables are 
controlled, the latter may be more 
acceptable. 


The remainder of the comparisons 
of psychomotor and physical de- 
velopment is summarized in Table 1. 
Generally, the results indicate white 
and Negro children represent the 
same populations in respect to each 
of the variables measured. 

In a study obliquely related to 
white-Negro comparisons, Codwell 
(1949) separated a Negro group into 
three groups varying in degrees of 
“Negroidness.”” Composite motor 
functioning did not change from one 
group to another, implying no differ- 
ences between those more Negroid 
and those more white. 


Overview of Physical and 
Motor Development 


On some physical measurements of 
children, especially anatomical meas- 
urements of substructures, a racial 
difference appears to exist between 
whites and Negroes. After equating 
for socioeconomic variables, investi- 
gators on the whole find that differ- 
ences in psychomotor functioning 
tend to disappear. 


PsyCHOPHYSICAL FUNCTIONS 


A few studies have endeavored to 
compare white and Negro subjects on 
variables which may roughly be 


TABLE 1 
CoMPARISONS OF INFANTS AND CHILDREN IN PSYCHOMOTOR AND PHYSICAL DEVELOPMENT 
Author(s) Topic Subjects Results Comments 
Williams & Development (Gesell) | W low S-E® W>N | Authors attribute to differential child- 


Scott (1953) 


Scott, Ferguson, 12 neuromuscular steps | Infants 


status, N high 
S-E status 


rearing, e.g., greater permissiveness in 
lower class 


N>W | N from low S-E class >N from higher 


Jenkins, & S-E class up t veek, 5 
Cutler (1955) fe ate uy POIsStU week Same ee 
Irwin (1949) Speech Infants: Professionals’ children h; 
1-10 days WEN | abolic curves of nouetio develonuient 
ist 30 mos. | WSN | than laborers’ children 
Moore (1942) Eye-hand coordination | Preschool WEN | From 24-35 mos. N child superior to W 
WSN 


Espenschade 
(1946) ity 


Ramsey (1950) Pubertal changes 


boys 


Coordination and agil- | 10th grade 
girls 


Adolescent WSN 


a Socioeconomic. 
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classed as biological, but which have, 
we know, psychological components 
of varying degrees of importance. 
Measurements of pain thresholds 
by Chapman (1944) suggest that 
Negroes perceive radiant heat at a 
lower threshold than do whites. In 
this experiment 18 Negroes and 18 
Americans of southern European an- 
cestry were employed; but even 
though for psychophysical experi- 
ments a small N would be satisfac- 
tory, the use of psychoneurotics as 
well as normal persons might bias the 
results for a normal population. Fur- 
ther, only raw data are given without 
appropriate statistical treatment. 
Among selectees and inductees, 
2,200 Negro and 21,000 whites, it was 
revealed that more colored selectees 
had normal vision in each eye than 
did whites. No decrease in the dif- 
ferences was noted with age. It was 
also determined that the poorer eye 
fared better among Negroes than the 
better one did among whites. Kar- 
pinos (1944) who reports the study 
gives no reason for the findings. In 
the Canal Zone, Covell (1950) dis- 
covered that Negroes develop pres- 
byopia 5 to 10 years before whites do. 
Other than a higher incidence of lues 
and tuberculosis, the usually ac- 
cepted causes of presbyopia do not 
seem to offer an explanation for the 
differences. Personal insecurity and 
social inequality are suggested by the 
author to be of fundamental im- 
portance jn Setting off organic 
changes within the lens, 
absolute visual thresholds a 
under dark ad i 
and become pr 
the pigment 
lighter or is re 
study by Helson and Guilford (1933). 
As the center 0 
the difference grows less (as it does 
between white persons with different 
colored eyes). Woisika (1944) con- 
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cludes that the Negro is superior to 
the white in dark adaptation. Cor- 
relation of age and dark adaptation 
(Negro mean age=39, white =46) 
may vitiate the conclusion. . 
Some racial differences appear in 
psychophysical functions. Much 
more research of the type Eysenck, 
Granger, and Brengelmann (1957) 
have done on perceptual processes 
and mental illness will have to be 
done in order to discover what part 


genetic factors play in these differ- 
ences. 


INTELLECTUAL FUNCTIONS 


Shuey (1958) has reviewed the lit- 
erature comparing Negro and white 
intelligences at least as far back as 
1913. We shall not endeavor to cover 
the same ground. In her text are 
found valuable tabular comparisons 
of Negroes and whites for various age 
Sroups, the armed forces, gifted and 
retarded, delinquents and criminals, 
and racial hybrids. (Actually a num- 

er of studies reported in the litera- 
ture are of hybrids even though not 
recognized as Such.) Shuey's bibli- 
ography and résumés are a must for 
serious students, for she has not only 
gathered together the better known 
studies, but has ferreted out other- 
wise obscure and inaccessible articles 
and theses, 

The usefulness of Shuey’s other- 
wise excellent work is limited by 
what appears to be a polemic atti- 
tude. Her book seems to be an at- 
tempt to prove a nonegalitarian hy- 
pothesis rather than being strictly a 
review of literature. In this coor 
Shuey does the same rationalizing 
from an hereditarian standpoint that 
Klineberg (1944) did in his earlier 


i ; 
“review” from an environmental 
standpoint, 


, North (1957) likewise surveys the 
literature and comes to an opposite 
conclusion from the one Shuey 
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reaches. Whereas the latter concluded 
that all the evidence points “. . . to 
the presence of some native differ- 
ences between Negroes and whites as 
determined by intelligence tests,” 
North maintains that there is no 
proof of biological inferiority or that 
the Negro’s potentials for educa- 
tional and cultural development are 
more limited than the white person's. 
North’s coverage of the literature is 
considerably less extensive than Shu- 
ey’s. 

The following discussion on com- 
parisons of intelligence between the 
two racial groups endeavors to sup- 
plement Shuey’s work especially and 
correct it where it is in patent error. 


Children and Adolescents 
Young Children 


Aside from the infant studies cited 
previously, studies of preschool chil- 
dren in which comparisons are made 
between whites and Negroes are 
sparse. Shuey (1958) cites only nine 
altogether and only five reported in 
1944 or after. On the whole, young 
Negro children score lower than 
whites. But the differences are very 
much less than in older groups; and 
in all of the reports in which average 
IQs are given, Negroes average well 
within the normal IQ range for 
whites. Shuey offers several explana- 
tions of the discrepancy between the 
results for preschool and school chil- 
dren, including inadequate sampling 
of preschool, the relative invalidity 
and unreliability of tests at younger 
ages (although some of the lowest 
Binet standard errors are found in 
the late preschool years), more verbal 
and abstract tests in the school years; 
she also suggests that mental growth 
curves may not be the same for both 
races, and that IQs may be less af- 
fected by environment in the pre- 
school years. 

Special comment is called for with 


respect to two investigations. Brown 
(1944) compared Minneapolis kinder- 
garten children on the Binet, Form L, 
discovering that the Negro mean of 
100.8 was significantly lower than 
the white mean of 107.1. The white 
children in Occupational Classes VI 
and VII (Minnesota scale) averaged 
about the same as Negro children. 
Shuey takes exception to Brown’s 
conclusion that at “nominally sim- 
ilar” socioeconomic status Negro 
children are not inferior to whites. 
Her observation that Brown has 
small Ns in Levels VI and VII does 
not constitute an objection from a 
statistical standpoint. Nevertheless, 
Brown’s conclusion would have been 
stronger if he had, first, differentiated 
his Negroes by class without assum- 
ing their occupational status and, 
second, employed an analysis of vari- 
ance design. 

The other study to which special 
attention must be paid is that of 
Anastasi_and_ D’Angelo (Anastasi & 
D'Angelo, 1952; D’ Angelo, 1950). 
Five-year-old children in mixed and 
unmixed neighborhoods in New York 
City were administered the Good- 
enough Draw-a-Man Test and studied 
for language development in spon- 
taneous conversation recordings. IQs 
were 101.8 and 101.5 for Negro and 
white ‘children, respectively. Lan- 
guage development appeared to be 
somewhat more advanced for white 
than for colored children. This im- 
portant study is dismissed by Shuey 
as permitting no generalizations, be- 
cause selection of subjects appeared 
to be biased. The crux of the matter 
is whether all qualified subjects were 
utilized or only certain selected ones. 
Anastasi? points out Shuey’s error 
in misjudging the selection proced- 
ure, and provides a satisfactory 


2 Anastasi, Anne. Personal communication, 
July 19, 1958. 
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answer to Shuey’s objections. With 
due recognition of the limitations of 
the Goodenough as a test of intelli- 
gence we may yet regard Anastasi 
and D’Angelo’s results as a challenge 
to nativist theories of intellectual 
differences between the races. 


Older Children and Adolescents 


Little question need be raised at 
this time about the results of testing 
school children. Almost all evidence 
points to inferior performance of Ne- 
groes on tests of either the more tra- 
ditional variety or those tending to 
be more “culture-free” or “culture- 
fair.” What Shuey (1958) has done 
in this respect is not to present star- 
tlingly new conclusions, but to mar- 
shall data which have been more or 
less familiar to scholars for many 
years. Our purpose here, then, is to 
re-examine some of the data and pres- 
ent material not covered by Shuey, 
rather than to repeat what Shuey 
has done with a fair degree of thor- 
oughness, 

McGurk's studies. 
long time interest (McGurk, 1943) in 
Negro-white comparisons of intelli- 
gence, McGurk (1951, 1953a, 1953b) 


account for differences 
and Negroes in intelle 
ing. Two hundred thi 
and 213 white youths w 
for age and curriculum a 


socioeconomic variables 
controlled, the mean scores of whites 
and Negroes differed significantly in 
the direction usually reported, Also, 
when the specially devised test items 
were separated into “cultural” and 


were thus 


“noncultural,” the differences were 
greater on the noncultural questions 
than on the cultural questions, con- 
trary to environmentalist indications. 
Further, as socioeconomic status in- 
creased, the differences between Ne- 
groes and whites increased rather 
than decreased, again contrary to ex- 
pectations from a cultural theory of 
differences in intelligence. 

One point at which McGurk’s anal- 
ysis might be misleading to the sta- 
tistically untrained has been pointed 
out by Long (1957). McGurk states 
that 25% of Negroes overlap whites, 
from which it might be concluded 
that only 25% of the specified Negro 
Population have scores in common 
with the specified white population. 
Actually, analysis of McGurk’s data 
shows that 91% of the Negroes have 
Scores in common with the whites. 
Inasmuch as McGurk has addressed 
himself to the lay public, he should 
make abundantly clear that ‘‘over- 
lap” is used in the technical sense of 
exceeding (or for an upper distribu- 
tion, falling below) the mean or 
median of another group. 

McGurk does not claim, though he 
Seems to imply, that the superiority 
of whites on intelligence tests results 
from innate factors. In the final anal- 
ysis the implication may be correct, 
but research such as McGurk’s can- 
not establish it. Indeed, we do not 
see how the issue can be resolved by 
any number of ingenious methods of 
equating for social and economic 
variables. The various indices of 
Socioeconomic status already devised 
or those at present conceivable on 
the same Principles are intended to 
distinguish Social classes from one an- 
other. How they can be employed to 
compare individuals in diferent 
Castes, except very roughly, is diffi- 
cult to see, 

In actuality, not only in the South 
(Dollard, 1949), ut North as well 


Kw 
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(Brown, 1944; Long, 1957) whites 
and Negroes comprise separate castes; 
they are not merely representatives 
of different classes. In the state of 
Florida where the writers reside there 
are a number of Negroes whose social 
and economic statuses exceed those 
of most white persons. These Ne- 
groes, however, cannot yet sit in the 
same seats on public transportation 
(in most places), go to the same hotel, 
restaurant, club, school, church, so- 
cial events, or even restrooms. Al- 
though some of these strictures do 
not hold in Northern states, attitudes 
regarding intermarriage and the more 
personal forms of social intercourse 
do not appear greatly different from 
those held in the South. From the 
Early Childhood Project (Radke, 
Trager, & Davis, 1949) we learn that 
in Pennsylvania, from which state 
McGurk drew part of his sample, 
children discern within at least the 
first four or five years of life their so- 
cial and ethnic roles, with attendant 
supervaluations or devaluations of 
self and performance expectations. 
Interlocked with caste variables are 
those which influence performance, 
such as the color of the investigator 
(Trent, 1954), which in turn may be 
related to deterioration of intellectual 
performance under anxiety-provok- 
ing conditions (Beier, 1951; Ham- 
mer, 1954). 

We wish to emphasize that we are 
not taking sides at this point in the 
heredity-environment controversy in 
relation to intellectual differences; 
we believe both camps (e.g., Canady, 
1943, for environmentalists) have 
mistakenly assumed that if the two 
racial groups are equated in terms of 
social class and economic variables 
that a definitive answer can be found 
concerning even the relative weight- 
ing of innate or acquired factors. In- 
volved here are different dimensions 
(possibly correlated, of course) not 


merely different quantities along the 
same scale. Quibbling with McGurk 
over minor points of methodology 
should not obscure the value of the 
valiant attempt he has made to test 
hypotheses proposed by the environ- 
mentalists. The error lies in the as- 
sumptions both he and his opponents 
make. 

Studies on the WISC. Only one in- 
vestigation (Young & Bright, 1954) 
is cited by Shuey under WISC studies. 
Not surprisingly in view of its stand- 
ardization, the WISC was found in- 
appropriate for testing Southern Ne- 
gro children from 10 to 13 years of 
age. 

Another more extensive study has 
come to the writers’ attention, that 
of Caldwell (1954). Four hundred 
and twenty Negro children were 
tested ranging from 6 to 12 years of 
age, with equal numbers of males and 
females, drawn from towns in five 
deep South states and randomly 
selected from school rosters. One ex- 
aminer tested 342 of the subjects. Ac- 
cording to the report, “excellent rap- 
port was obtained,” although the 
means of establishing rapport appear 
inadequate especially in the light of 
the conspicuous Southern accent of 
the chief white examiner. Caldwell’s 
hypothesis that a difference exists be- 
tween Southern Negro children and 
the white standardizing group was, 
as might be expected, borne out. So- 
cioeconomic class influence was prob- 
ably strong, with 75% of the subjects 
in the lowest third of SE groupings. 
Nevertheless, the Full Scale 1Q mean 
of 85.52 is considerably higher than 
that obtained in the Young and 
Bright study (mean = 67.74). (Some- 
thing is wrong with the standard de- 
viations reported on p. 18 of Cald- 
well’s dissertation: SDs ranging from 
0.63 to 1.4 sound more like standard 
errors, but do not seem to jibe with 
either the WISC Manual’s, the data 
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of Young and Bright (1954), or the 
spread one might expect.) In this in- 
vestigation the suggestion is also 
made that cultural bias results from 
using the WISC, standardized as it 
was on a white population. a 
“Culture-free’’ or “Culture-fair 
studies. McGurk’s attempt to dif- 
ferentiate cultural and noncultural 
tests from one another is only one ina 
series which began with the use of the 
Army Beta in the first World War. 
Although the first performance tests 
were intended to overcome literacy 
handicaps, they became in time use- 
ful for estimating intelligence across 
ethnic, racial, and cultural barriers, 
In Table 2 are summarized endeavors 
to compare Negro and white children 
by means of tests which were either, 
like the Beta, adapted for cross-bar- 
rier use, or specifically constructed to 
eliminate cultural factors. Some of 
these studies are cited by Shuey, but 
are included here as essential for un- 


derstanding this area of pee oe 
As far as we have been able to Ei 
no one has employed Cattell’s cule 
ture-free’’ test (Cattell, 1940, 1951; 
Cattell, Feingold, & Sarason, 1941) 
for comparing Negroes and whites. 
This test has yet to prove itself, how- 
ever, as even “culture-fair,”” to say 
nothing of “‘culture-free.”” Marquart 
and Bailey (1955) present convinc- 
ing evidence that the “culture-free 
test has sufficient variance associated 
with the Binet to suggest they are 
measuring much the same capacities. 
Scale 1 correlates with the Binet .74 
and Scale 2 correlates .81. Most test 
constructors who would want to 
measure what the Binet measure 
would be very happy to obtain guo 
coefficients. What Goodenough an 
Harris (1950) say about the Di 
a-Person Test appears relevant to al 7 
“culture-free” or “culture-fair” tests: 


The search for a culture-free test is 
illusory, 


TABLE 2 


COMPARISONS ON “CULTURE-FREE"” 


OR “CuLTuRE-Farr” TEsts 


Author(s) Test Subjects 


Results Comments 


Woods, Boger, & 


Beta (Lindner- 
Holman (1954) i 


N adolescents, 


Gurvitz Scale) delinquents, 
nondelinquents 
Woods & Toal Revised Beta | N & w adoles- 
(1957) cents matched 
or IQ 
Love & Beach Davis-Eells N & Wf 
(1957) Games three S-E levels 


Fowler (1957) 3 “culture-con- 


N&W, i 
trolled" tests iz gale 


& nonethnic 


Hammer (1954) CTMM (non- | N children and 

language) adolescents 
Newland & Chicago Non- | E. Tenn. N 
Lawrence (1953) Verbal Exam. 


School children 


Coppinger & Full-Range Pic-| N children from 
Ammons (1952) | ture Veee] D La. Parishes 
lary Test 
5 “Culture-fair" | High & low 
MRBS test of 16 items | status W, low 
status N 


al- 
Subtest 4 low Authors conclude aim of subtest equa! 


e- 
ity not fulfilled or cultural factors d 
prive N on subtest 4 


W>Non Authors suggest Detection of Erny 
Paper Form- | and Drawing Completion are cuneoriz- 
board, Draw- loaded—probably ex post facto 
ing Comple- ing 
tion; N >W on 
Digit Symbol, 
Visual Com- 
Parisons r 
i ing in 
W>N Differences obscured by N's being 
two lower S-E levels only 
ween 
W>N Positive correlation also found beteg” 
conventional and “culture-co 
tests and S-E status 
d 
N <W norms | Results same with both language an 
nonlanguage factors 
N <W norms Differences 2, 3, and more years 
N <W about | At upper age levels urban N children’s 
two years mean scores =~W norms 
W high N i een W groups. 
state on o difference betw 


Smaller difference between N & wv ae 
“culture-fair” test than on standa: 
telligence test 
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Gifted children. Shuey’s review of 
comparative studies (both directly 
and by implication) reveals that 
whites produce a greater proportion 
of gifted children by far than do Ne- 
groes. The percentage reported by 
Shuey for the latter, about 0.14% or 
0.15% is well below the 0.95% for 
white children testing 140 1Q or 
above on the Binet or comparable 
scales. This low proportion among 
_ Negroes is only an expression of the 
general situation, i.e., that the whole 
curve for Negroes on most intelli- 
gence tests is displaced downward. 
Even taking Jenkins’ (1950) figure of 
0.3% for the 140 IQ or above places 
the area under the upper end of the 
curve below that for whites. 

In surveying cases of gifted Negro 
children, Jenkins at first (1943) main- 
tained that the gifted Negro child has 
essentially the same characteristics 
as a comparable white child. But 
later Jenkins (1950) recognized from 
more intensive investigation that the 
most important single fact for any 
Negro, gifted or not, is his being a 
Negro. Consequently, the perform- 
ance he manifests on an intelligence 
test as well as elsewhere is literally 
colored by this fact. 

That Negro children earn IQs of 
160 or above (Jenkins, 1943) or even 
as high as 200 (Theman & Witty, 
1943; Witty & Jenkins, 1935) on 
tests standardized on whites is a re- 
markable phenomenon. Anyone who 
has tested Negro children has prob- 
ably been impressed with the fact 
that a Negro child whom the exam- 
iner knows to be functioning as a 
normal, not a retarded child, may re- 
ceive a score which automatically 
would classify him in the retarded 
range if scores only were regarded. 
Roughly speaking, the Negro child 
seems to operate in everyday life situ- 
ations in a way expected of a white 
child about 10 IQ points above. With 
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the curve of measured Negro intelli- 
gence displaced downward it is thus 
a surprise to find any Negro children 
scoring among the highest levels on 
white-standardized tests. 

Specific abilities. There have been 
few studies comparing Negroes and 
whites on specific abilities. Only two 
such reports worthy of mention came 
to the attention of the writers. 
Prothro and Perry (1950) adminis- 
tered the Meier Art Test to 460 high 
school and college students in Louisi- 
ana and found that mean score com- 
parisons placed the whites above the 
Negroes. However, the authors felt 
that the differences might be due to 
the socioeconomic status of the 
groups. Negro and white sixth 
graders were tested on the Kwalwas- 
ser-Dykema Music Ability Tests 
with the Negroes obtaining a higher 
median score (Woods & Martin, 
1943). The authors conclude that 
the Negroes were superior in a num- 
ber of specific areas, but the meager 
statistical data presented do not sup- 
port their conclusions. 

Klugman (1944) in a study not 
mentioned by Shuey discovered that 
money incentives and praise had no 
demonstrable effects on white chil- 
dren (CA 7-14) in taking the alter- 
nate form of the Binet after an initial 
first form. However, comparable 
age Negro children given money re- 
wards showed better performance 
than those given praise as an incen- 
tive. (Incidentally the two groups 
averaged 99.11 for whites and 98.00 
for Negroes, with 6.08 and 5.66 SDs, 
respectively.) 

In an investigation which com- 
pares with similar studies on white 
children, Robinson and Meenes (1947) 
found the relation between Negro 
‘children’s test intelligence and pa- 
rental occupation very low. From 
1938-39 to 1945-46, the intelligence 
average rose from 97.0 to 99.7, a 
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statistically significant change, a 
the relation between occupation 
parents and intelligence increase 
as well—in this case it might be bet- 
ter to say that the relative lack of 
relation became less pronounced. A 
greater number of Negroes in higher 
status occupations could increase the 
coefficient attenuated by abbreviated 
read of scores. 
we Davis (1948) and Davis and Hav- 
ighurst (1946) in a test of the influ- 
ence of social class upon learning 
draw attention to differences in child- 
rearing practices of lower and mid- 
dle classes, in general, that less rigid 
behavior is expected of children in 
lower classes. Comparing 100 Negro 
and 100 white families, Davis first 
indicates that the Negroes tended to 
be in the lower class. Where Negro 
families could be regarded as middle 
class, their child-training practices 
were conservative in regard to feed- 
ing, toilet training, and masturba- 
tion. It seemed to Davis that differ- 
ences in adult behavior were greater 
within either Negro or white groups 
than between them. On mental tests, 
in which our interest lies principally, 
there were significant differences be- 
tween the high and low socioeconomic 
groups. Davis seems to assume that 
these differences are culturally deter- 
mined, but gives no satisfactory 
evidence in this regard. He does go 
on to urge the development of “cul- 
ture-fair” tests and the elimination 
of academic intellectual tasks in the 
testing of intelligence, 


any years ago Sunne (1917) dis- 
covered that on som 


ve which have’ 
produced the same Tesults. At least 


part of the differentia] Performance 
may be attributable to Status differ- 
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ences if the lead suggested by Eels 
Davis, Havighurst, Herrick, an 
Tyler (1951) is in the right direction. 
These investigators found that status 
differences among children affect 
some test items and not others. 


Adults 


In addition to Shuey’s review 
(Shuey, 1958), we report here a few 
other studies from the military serv- 
ices and other special groups for 
which comparisons have been made. 
One massive investigation, part of the 
standardization of the PAT (Tomkins 
& Miner, 1957), is not mentioned by 
Shuey, but by all means should be 
brought to the reader’s attention be- 
cause of the careful selection of TED? 
resentative subjects. The investi- 
gator (Miner, 1957) employed a 20- 
word vocabulary test adapted from 
the CAVD. The sample of 1500 
individuals was chosen from stratified 
random clusters of blocks or sas 
areas with quotas on importi 
variables, Whites averaged 11.06, 
SD 3.41, and Negroes 8.08, SD 2.72, 
a significant difference, 


Armed Forces 


In the period since 1944 debate 
Over testing of intelligence in both 
world wars has been carried on. Al- 
though it seems to have subsided, it 
is not so much that the issues have 
been settled as that the contestants 
have become exhausted. To Shuey $ 
conclusion, that in both wars whites 
consistently did better than Negroes 
on the average, few would take ex- 
ception. She takes up one by one the 
reasons advanced by others for the 
discrepancies and provides reasons on 
the opposite side which purportedly 
answer an egalitarian view. The 
reader will haye to study Shuey 1n 
order to judge whether the data sup- 
port her reasoning. Additional dis- 
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cussion of some of the issues involved 
can be found in a series of articles 
by Garrett (1945a, 1945b, 1945c, 
1947). 

Several studies not cited by Shuey 
are mentioned here (Altus, 1946; 
Altus & Bell, 1947; Fulk &_ Harrell, 
1952; MacPhee, Wright, & Cummings, 
1947), primarily for the,sake of com- 
pleteness inasmuch as they do not 
differ in their results from others. 
Fulk and Harrell endeavored to 
equate groups by the last school 
grade completed; they compared 
AGCT scores and discovered that 
whites were favored at every school 
grade completed. The authors recog- 
nized that equal schooling does not 
render groups equivalent. Origins of 
their subjects, a highly important 
variable, were not controlled. Altus 
and Bell interpreted their discrepant 
findings on illiterates as due to cul- 
tural factors. 

In a survey of “the uneducated” 
Ginsberg and Bray (1953) cite figures 
to indicate that the number of Negro 
illiterates declined from three million 
to slightly more than one million from 
1890 to 1940. In World War II, 
391,300 whites and 325,100 Negroes 
were rejected on the grounds of illit- 
eracy, almost three-fourths of these 
rejectees being from the Southeast 
and Southwest. 

Ginsberg (1956) draws the conclu- 
sion from studies of manpower in the 
Second World War that there is a 
great wastage of human potential 
when one realizes that the absolute 
numbers of Negroes in the upper 
classes of the General Classification 
Test are large, even though percent- 
agewise Negroes have small repre- 
sentation in Classes I and II. The 
late Walter V. Bingham pointed out 
the waste of potential of all groups 
implied by the fact that one-fourth 
of the truck drivers exceeded one- 
fourth of the bank executives. 


Special Civilian Groups 

A large number of comparisons of 
college students reviewed by Shuey 
can be supplemented by only one 
series of studies by Roberts (1946, 
1948, 1950). Northern Negro college 
students did better on the ACE than 
did their Southern counterparts. Ina 
longitudinal comparison original dif- 
ference in academic achievement 
total scores were erased after four 
years of college, even though ACE 
scores still reflected differences. Dur- 
ing the four-year period the subjects 
made greater gains than those ex- 
pected by the national norms for a 
similar period of time. 

Comparing Southern Negro and 
white venereal disease patients, Scar- 
borough (1956) discovered differ- 
ences on the Wechsler-Bellevue paral- 
leling those of other investigators, 
but the differences between white and 
Negro VD patients did not seem to 
be as great as those between white 
and Negro controls. Davis (1957) 
using the same test found no differ- 
ence between 33 mental patients and 
27 controls, though the overall IQs 
of 67 and 68 are well below white 
norms. Also employing the Wechsler, 
DeStephens (1953) answered his own 
question, “(Are criminals morons?” 
in the negative when he discovered 
that 200 white and 100 Negro admis- 
sions to an Ohio reformatory yielded 
the following average scores: FS: W 
93.55, N 87.90; V: W 90.13, N 86.70; 
P: W 98.30, N 91.20. On the basis of 
the standard deviations reported 
none of these differences between 
groups is significant. Findings of 
Reitzes (1958) on Negro applicants 
to medical schools yield a different 
result. Generally it is expected that 
Negro applicants will rank lower than 
white applicants on the Medical Col- 
leges Admissions Test. Regional 
differences appeared among Negro 
applicants with North and West 
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above border applicants, who in turn 
ranked higher than Southerners. 


Overview of Intellectual Functions 


Several grave issues arise in a re- 
view of comparisons of intelligence 
of whites and Negroes. We shall 
phrase these and make a few com- 
ments on each. It will be obvious 
that some of the problems apply 
more generally than to intelligence 
comparisons, but they are set forth 
here inasmuch as all of them are re- 
lated to the foregoing comparisons. 


What Constitutes a Race? 


For convenience we have assumed 
that groups belong to the white 
“race” or Negro “race” as designated 
by the investigator. Any research, 
however, which seriously attempts to 
make comparisons between or antong 
É ” must sooner or later grapple 
with the problem of this section. 
Skin color, hair texture, and other 
physical characteristics have proven 
illusive as definitive criteria. We 
cannot settle a question to which 
anthropologists appear not yet to 
have given 
Nevertheless, 


in order to ma 
tions they are comparing, 


Determination 


of Racial C iti 
of Groups omposition 


_ Assuming some acceptable defini- 
tion of Tace, the investigator needs to 
determine in each case whether the 
groups he is comparing actually are 
differentiated by his criteria. If, for 
example, he has adopted heredity, 
the possession of such-and-such a 
percentage of accepted white or 
Negro ancestry, as a criterion, the 
researcher must make certain his 
groups fall within the limits of the 


specific heredity he has accepted. 
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One suspects that in a number of 
cases so-called racial comparisons 
are being carried out between one 
group designated as “white” and an- 
other designated as “black” which 
consists of many who are partly or 
even largely white. 


Confusion of Class and Caste Variables 


As we have indicated in comment- 
ing on McGurk’s studies, both hered- 
itarians and environmentalists have 
fallen into the trap of assuming that 
if they can get two groups who are 
equated in socioeconomic terms there 
can be an experimentum crucis which 
answers the question of what parts 
heredity and environment play in intel- 
ligence differences, Canady (1943a) 
discussed the great difficulty in equat- 
ing environments, and Anastasi and 
Foley (1949) pointed out that formal 
education and socioeconomic differ- 
ences Were not enough to account for 

ifferences in tested abilities of north- 
ern and southern Negroes. Actually, 
we despair of being able to equate 
groups until caste differences are re- 


moved and only class differences 
remain, 


Interaction 
Variables 


In some studies (Trent, 1954) the 
color of the examiner has been taken 
into account in assessing results; 1n 
others this factor has been ignored. 
Repeated testing of the same individ- 
ual, both white and Negro, by both 
white and Negro examiners would 
Seem to be called for in order to deter- 
mine the proportion of variance in 
intelligence test scores attributable 


to interaction of examiner and sub- 
ject, 


of Examiner and Subject 


The Functional Value of Intelligence 


What kinds of intelligence do white 
Persons need and what do Negro 
Persons need to survive, adjust, and 
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make progress? It is naive to assume 
that the academic types of intelli- 
gence tests which have traditionally 
been the instruments of comparison 
compare in reality Negroes and 
whites in those areas of intelligence 
which they are called upon to use in 
“real life” situations. Intelligence 
test differences between Negroes and 
whites cannot mean the same as they 
mean between two groups of whites. 

If we assume that intellectual 
functions develop adaptively and are 
not entirely determined by heredity, 
we may suppose that intelligence 
tests of the usual variety measure in 
part that which is developed in order 
to achieve success in a certain culture. 
A negro in a white man’s world re- 
quires a kind of intelligence enabling 
him to detect from minimal cues how 
a white man is going to react to a 
critical situation. Usually his success 
and sometimes his life depend upon 
this kind of intelligence, hardly ever 
upon whether he can define ‘‘eth- 
nology.” Tests of intelligence tap- 
ping the kinds of intellectual func- 
tioning called for in achieving success 
in the actual world men have to face 
might reveal different results in com- 
parative studies, not only between 
Negroes and whites, but other groups 
as well. 


Newer Concepts of Intelligence 


Practically no research has been 
done comparing white and Negro 
subjects on factors of intellect (Guil- 
ford, 1956, 1959; Thurstone, 1938). 
The only studies of this kind we have 
found are those by Lee (1951) and 
Michael (1947). Even studies with 
performance-type tests, which in a 
sense go outside of academic intelli- 
gence quotient concepts, do not meet 
the need to reach the many factors 
revealed in modern factor-analytic 
research. A world of research is 
called for to determine how the two 


groups we are considering compare on 
the 50 or more factors of intellect. 
Caution needs to be exercised here in 
respect to the functional value of 
even Guilford’s many factors. These 
were derived from tasks set for high- 
level personnel in the military and 
might not have anything specifically 
to do with what a person ina different 
class and caste needs to do. How- 
ever, insofar as these factors cover 
the entire range of intellect, compari- 
sons of factors should be much more 
valid than of gross IQs. 


Significance of Overlapping 
Distributions 


A legitimate question arises in 
statistics in relation to assigning 
individuals to one of two (or more) 
distributions (Horst, 1956). In sta- 
tistical theory individual scores ina 
distribution may be regarded as 
errors departing from a mean (Yule & 
Kendall, 1949). When two distribu- 
tions are compared, then, it is as- 
sumed that individual scores are only 
errors of observation from their 
respective means, which in turn are 
regarded as estimates of either one 
true population mean (if the null 
hypothesis is not rejected) or of two 
true population means (if the null 
hypothesis is rejected). In the case of 
intelligence comparisons the null hy- 
pothesis may be stated: the obtained 
means of these two distributions of 
intelligence scores differ only by 
chance, i.e., they are really only esti- 
mates of the same true intelligence 
score population mean. On the whole, 
investigators have rejected the null 
hypothesis in comparing Negroes and 
whites and have concluded that the 
two means represent two different 
intelligence population means. 

„In connection with any one indi- 
vidual, however, especially in the 
area beyond the mean of either dis- 
tribution, the question may legiti- 
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mately be asked: To which intelli- 
gence mean does he belong—or, from 
which intelligence mean is his score a 
deviate? It seems to be assumed that 
because an individual is white or 
black, his score must therefore be a 
deviate from a white intelligence pop- 
ulation mean or a black intelligence 
population mean. But this assump- 
tion begs the question. The individ- 
ual may be a deviate from the lower 
intelligence population mean whether 
he is white or black, or from the up- 
per intelligence population mean 
whatever his color. The individual 
who is white but in the very low part 
of the scale may in truth be a deviant 
from the upper mean but we cannot 
know this fact merely because he is 
white. We could only be relatively 
sure of this fact if there were no over- 
lap in the absolute sense between the 
two distributions. 

This statistical consideration is a 
variant of the more general one which 
suggests that other factors than color 
(ie, genetically determined intel- 
lectual concomitants of color) decide 
whether an individual makes a high 
or low score on an intelligence scale, 


Statistical situa- 


Social Consequences of Research 
Findings 


book (1958), G: 


: PS, especially 
for mistreatment of Negroes, A view 


strictly limited to Scientific conclu- 
sions would be, “Here are the results 
of examination of the data. It is 
not our responsibility to recommend 
courses of action.” If there should be 
significant differences between groups 
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which can be shown to arise princi- 
pally from genetic factors, the practi- 
cal response of the man on the street 
(white or Negro) would almost in- 
evitably be to justify either his treat- 
ing others as inferiors or to accept his 
own position of inferiority as natively 
determined. We are not convinced 
that genetic differences have been 
shown; but even if they were so 
shown, we believe it is incumbent 
upon the social scientist to set forth 
the full picture. The wide overlap 
between white and Negro distribu- 
tions of scores should be pointed out 
so that it is evident that within group 
differences are far greater than be- 
tween group differences, It should 
also be shown that oftimes two 
groups of white persons differ signifi- 
cantly, and Probably in some if not 
all cases, Partly because of genetic 
factors. Social scientists need to be 


alert to the implications of their 
findings, 


EDUCATIONAL AND Post-EDUCA- 
TIONAL ATTAINMENT 


Table 3 presents mostly direct 
Negro-white comparisons in school 
achievement and in the case of 

odgers’ study some post-educational 
attainment. It is generally recog- 
nized that as a national group Ne- 
groes receive a poorer education 
than whites both in quantity and 
quality, a condition reflected in 
Scores on achievement tests. Other 
factors, however, may have to be 
adduced for explaining Witty and 

eman’s results. 


TEMPERAMENT? 


The studies in the following para- 
graphs are subdivided according to 


* Some would use “personality” rather than 


, temperament,” We follow what seems to 
be the'trend in recent years, to regard per- 
sonality as the larger set of functions under 
which temperament is subsumed. The latter 
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xy 
on 


TABLE 3 
RELATIVE ATTAINMENT OF NEGROES AND WHITES* 


Author(s) Subjects 


Results 


Comments 


Thompson (1956) | College teachers 


Ferrell (1949) Grades 4, 5,6 


Bullock (1950) High school graduates 


Witty & Theman 


N youth, Binet IQs 
(1943) 


120-200 identifed 6 
years before 


average 


Baltimore superior and 
Sverhe N identified in 


Rodgers (1957) 


Pay, morale: W >N 
teaching load: N >W 


W>N on Stanford 
Achievement Test 
(all areas) 


W>N on Iowa High 
School Content 


Exam, 
W gifted <N> 


Gifted, superior in 
achieving middle 
class, education, etc. 


Author points out little research done by N 


W more variable 


Some question about choice of samples but 
same results regardless of type of analysis 


Authors state attainment other than educa- 
tional relatively high 


Comparison with W only indirect 


® Bradley (1949) discusses relative literacy rates among military selectees, and Davenport (1946) educational 


attainment as it relates to military selection. 


the instruments employed. We 
sought for a more logical set of divi- 
sions, but fell back on this artificial, 
yet serviceable classification. One 
major reason for employing the test 
as a separate category is that the 
reader may judge the validity of the 
studies partly in terms of his knowl- 
edge of the validity of the test. With 
the exception of the Murray TAT, 
almost all of the instruments utilized 
for comparative research have been 
gravely questioned by qualified scep- 
tics. If studies using some particular 
test are scattered throughout a sec- 
tion among other results, it would be 
more difficult for a reader to judge 
whether the results from the particu- 
lar test reflect the validity or invalid- 
ity of the test or stem from the 
experimental conditions. This con- 
sideration is not a major one in other 
parts of this review. 


Rorschach 


It seems to us that there must be 
more studies comparing white and 


includes traits or reaction systems like 
cyclothymia-schizothymia, introversion-ex- 
traversion, neuroticism, and others which are 
largely of constitutional origin and not so 
much affected by environmental changes as 
other functions of the personality such as atti- 
tudes and social interests. 


Negro on the Rorschach, but we have 
found only two. Morons from both 
groups were compared by Abel, 
Piotrowski, and Stone (1944) in 
terms of specific responses to the ink 
blots. Out of the entire set of com- 
parisons only one showed a real differ- 
ence between the two groups: Ne- 
roes gave more M than whites. What 
the meaning of this finding is depends 
first upon whether this is a chance 
result in comparison with the prob- 
abilities involved, and/or secondly, 
if it is not chance, upon the meaning 
of M. 

The group Rorschach was admin- 
istered by Stainbrook and Siegel 
(1944) to high school and college 
students. No sampling statistics are 
presented, but responses are handled 
on the “simple basis of the probable 
differences of mean frequency of oc- 
currences.”” Various determinants 
were studied in isolation. The au- 
thors conclude that both high school 
and college Negroes show less fluidity 
in association, that high school Ne- 
gro youth are more emotionally 
stable and less impulsive than their 
white counterparts, and possess less 
anxiety, but that white college stu- 
dents though showing more emotional 
irritability are more mature, possess 
more “general personality resources” 
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than Negro college students and 
have a “more daring intellect.” Inas- 
much as the group Rorschach has 
better norms but less interpretability 
than the individual Rorschach, it is 
difficult to know how to evaluate this 
investigation in comparison with 
most Rorschach studies which use 
the individual form. Harrower- 
Erickson and Steiner (1945) warn 
against using interpretations from 
individual Rorschach .practice for 
group Rorschachs. 


Thematic Apperception Test 
Murray TAT 


With a white female examiner Abel 
(1945) found that both white males 
and females and Negro females were 
more communicative on the TAT 
than Negro males, who were of at 
least equal intelligence to the other 
groups. Communicativeness was 
measured by the number of ideas and 
the number of words, both of which 
yielded significant differences, It 
scarcely seems surprising in the light 
of other research 


on the TAT and 
some knowledge of Negro-white soci- 


odynamics that Negro males were 
inhibited in the presence of a white 
female examiner; but Possibly we 
are engaging in ad hoc reasoning. At 
any rate, the author failed to point 
out a major disclosure of her data, 
that sex differences were greater than 
racial differences on the measured 
variables. 


5 ences between TAT responses 
o 


. Fifty of 
-class, New 


ched. They 
cards plus one 
card and their 
yzed by a modi- 
with 28 need and 


special mother-child 
responses were anal 
fied Stein Schema 


22 press categories. Scoring was 
blind. On most of the items the two 
groups were similar, a not unimpor- 
tant finding. Nevertheless, 14 out of 
50 chi squares were significant at 
least at the 5% level of confidence, a 
more than chance number of signifi- 
cance tests. Negro boys expressed 
greater hostility in thought processes 
than did white boys, less need for act- 
ing out murderous aggression, but 
about the same for other types of act- 
ing out. The Negroes manifested 
more need to reflect, think and specu- 
late, but showed less desire for estab- 
lishing and maintaining friendly rela- 
tions, expressing admiration and re- 
spect for others, or being respected, 
followed and obeyed. They displayed 
less n Ach. Although white boys re- 
garded others as being more rejectant 
than Negro boys, they also looked 
upon them as being more friendly, 
while the Negro boys viewed the 
environment as more hostile. On the 
whole, it appears from Mussen’s data 
that the self-attitudes of the Negro 
boys and their corresponding atti- 
tudes towards others have suffered 
somewhere along the line, even 
though in most things their tempera- 
ments parallel those of white boys- 

More research utilizing the (Mur- 
rav) TAT is called for to make com- 
parisons of temperament variables. 
Of all the instruments employed for 
extra-intelligence investigation herein 
mentioned the TAT has suffered 
least at the hands of critics. It has 
demonstrated its value both clini- 
cally and experimentally, Conse- 
quently, comparative findings are 
not subject to the double jeopardy of 
having both methodology and_ test 
instrument attacked. The TAT has 
its faults in terms of interscorer reli- 
ability, but it is a fairly well proven 


technique for uncovering psycho- 
dynamics. 
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Thompson TAT 


In a logically-motivated and com- 
mendable effort to make it easier for 
Negroes to identify with TAT char- 
acters, Thompson (1949) published 
his Negro version (T-TAT) with 
characters of obviously Negroid fea- 
tures. For some reason Thompson's 
ill-starred attempt has stirred more 
research than one might expect, all of 
which appears to invalidate the as- 
sumption that Negroes identify bet- 
ter with pictured characters of their 
own race than with corresponding 
white characters. Incidental to the 
attempts to refute the assumption 
are direct or implied comparisons of 
Negroes and whites on the T-TAT 
or M-TAT. 

Korchin, Mitchell, and Meltzoff 
(1950), utilizing both T-TAT and 
M-TAT, performed an analysis of 
variance with race and social class 
as independent variables. No signifi- 
cant differences were found between 
two Philadelphia racial samples, but 
there. were significant differences 
between social classes on length of 
Murray TAT stories. Korchin et al. 
believe that the assumptions under- 
lying the T-TAT are not justified, so 
that a Negro modification is not 
called for. Ina fairly well-designed 
experiment Riess, Schwartz, and 
Cottingham (1950) reached the same 
conclusion concerning the Thompson 
assumptions and strengthened their 
case in another study (Schwartz, 
Riess, & Cottingham, 1951). 

Four groups of subjects, two white 
and two Negro, were given the M- 
TAT and the T-TAT by Cook (1953), 
the former to one group from each 
race and the latter to the alternate 
group from each race. Confirming 
previous findings on the lack of neces- 
sity for a Negro modification of the 
TAT, Negroes regarded characters 
in both sets of pictures as representa- 
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tive of people in general. Whites, on 
the other hand, looked upon the 
Thompson characters as Negroes 
rather than as persons-in-general. As 
a whole, considering the production 
on both sets, Negroes offered a sig- 
nificantly larger number of alterna- 
tives and of words indicating un- 
certainty, a higher vagueness score, 
and a large number of references to 
the pictures as pictures. Whites gave 
a higher word count. 

Carrying out research on whites 
only, Light (1955) found no signifi- 
cant differences between his matched 
subjects in the productions on the 
M-TAT and the T-TAT presented in 
balanced order. Light's subjects, 
like Cook’s, responded to the T-TAT 
characters as Negroes, with 14 of 
26 subjects using traditional themes 
of Negro inferiority. One difficulty 
in interpreting these results comes 
from our not being able to know 
whether the examiner was white or 
Negro. 

Although Thompson's attempt to 
provide easier identification for Ne- 
gro subjects on a projective test 
has not borne its intended fruit, the 
result of comparing whites and Ne- 
groes seems to be a highlighting of 
the cultural pattern of virtually 
universal use of white characters as 
illustrations, at least in publications 
white people see. Negroes perceive 
white and black as people; whites 
tend to see white as people and black 
as Negroes. Whether the explanation 
is as simple as that of experience with 
the cultural pattern remains for fu- 
ture research to disclose. 


Picture Arrangement Test 


The Tomkins-Horn Picture Ar- 
rangement Test developed by Tom- 
kins and Miner (1957) is probably 
the most extensively standardized 
projective test in the United States 
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if not in the world. We are not pre- 
pared to say that it is the most ade- 


intelligence 


a 20-word 
Thorndike-Gallup vocabulary test, 


together with information on major 


plate 
side-up view of all three positions. 
Scoring is objective—it may be done 


, deter- 
from the norma- 


With the PAT which he helped to 
andardize in part, Karon (1958) 
Performed an extensive series of anal- 
yses, utilizing a special discriminant 
function, to compare Negroes and 
whites with the intent of finding what 


effect, if any, caste Sanctions have on 
Negroes. 4 i 


As far as w 
“validation” 


above and compared. Final samples 
were chosen from nint 
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students from North and South. 

Results of the analyses confirmed 
in each case, according to Karon, the 
hypothesis that caste sanctions affect 
adversely the “personality character- 
istics” of Negroes, Significant differ- 
ences between discriminant score 
means are interpreted as the result 
of differential caste sanctions, These 
differences appear to indicate that 
actual possession of and simultaneous 
denial of aggression is the principal 
Problem of the individual living under 
caste restrictions, with a secondary 
resultant of flattened affect. There 
are differential “human costs” even 
between areas of more severe and 
those of less severe sanctions. 

Some methodological umbrage could 
be taken in view of Karon’s use of 
subjects taken from the standardiza- 
tion group for part of his experiment. 
Nevertheless, the results of his pilot 
Studies were validated on what seem 
to be entirely independent samples. 

Another Possibility exists for inter- 
Preting Karon’s results. Some form 
of selective migration explanation 
could apply to the dispersal of Ne- 
8roes which could account for differ- 
ences in affectivity and reactions to 
aggression. It is Possible that those 
Negroes who have remained under the 
severest caste sanctions have remained 
because of their temperament charac- 
teristics and those who have stayed 
under less severe restrictions or moved 
to relatively caste-free environments 
have done so because of their charac- 
teristics. In other words, they (and 
their children who Presumably resem- 
ble them) are under or out of caste 
sanctions on account of their personal- 
ities rather than having certain per- 
sonality characteristics on account of 
Sanctions. Those who can swallow 
their aggressions—with occasional out- 
breaks against their fellows—stay, 
those who cannot, go. This explana- 
tion seems fully as plausible as the 
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other, although we recognize the two 
are not mutually exclusive. 


Picture-Frustration Study 


Rosenzweig's Picture Frustration 
Study has been employed in several 
comparative investigations which are 
summarized in Table 4. There seems 
to be a tendency for Negroes to pro- 
ject more aggressive responses (E) 
than do whites. On the whole, simi- 
larities of responses are much more 
pronounced than differences, a situa- 
tion which may merely reflect the 
relative insensitivity of the instru- 
ment. 


Miscellaneous Tests and Other 
Instruments 


In 1942, on the basis of intercorre- 
lations of several scales administered 
to Fisk University students, Negro 
college students were regarded by 
Brunschwig (1942) as appropriate 
subjects with whom to employ tests 
and ratings devised for whites. Wheth- 
er scales standardized on one group 
can be applied with equal validity to 
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another racial or ethnic group is still a 
question. The TAT studies cited 
above might suggest that they can be. 
Whether or not they can be, we re- 
port studies from uncritical applica- 
tions and also from unstandardized 
instruments. Roughly, this portion 
of the review progresses from children 
to adults and chronologically. 
Reporting results of the same re- 
search in two journals, Hammer 
(1953a, 1953b) compared children in 
grades one through eight on the 
H-T-P Test; no control was exerted 
over socioeconomic variables. Clini- 
cian judges ranked the drawings on a 
six-point scale of neuroticism and a 
three-point scale of aggression. Ne- 
gro children rated higher (at the .01 
level) in aggressiveness. And whereas 
white children ranked between mildly 
neurotic and neurotic, Negro children 
scored on the average above severely 
neurotic. At every grade level, Ne- 
groes ranked higher in neuroticism, 
although at the upper levels white 
children became relatively more neu- 
rotic, thus decreasing the difference. 
Negroes showed specifically more 


TABLE 4 
COMPARISONS MADE WITH ORIGINAL AND MODIFIED ROSENZWEIG 
PicTURE-FRUSTRATION STUDY 


Author(s) Test Subjects Results Comments 
PFS Northern and SN women >NW women 
MeCary (1951) Southern N and W | on E® compared with I® 
McC: 1956 PFS Northern and Most scores; NW and | Normati' — 
cCary (1956) Northern ora W, | NN SSW and’ SN on te | samples choult cot be 
14-22 yrs. compared with M® and I | used for experiment 
McCary & Tracktir | PFS High, medium, and | Low IQ: N>W males on 
(1957) low IQ W and N E compared with I and 
GCR® 
Medium IQ: W>N fe- 
males on N-P* 
High 10: N>W fem: 
notes ales 
Portnoy & Stacey Children's PFS bı A A : 
(1954) ae Spon ma matched N27 Wien M. Matching: length of in- 
stitutionalization, in- 
telligence, etc. 
Winslow & Brainerd | Modified PFS | Matched N and W | Both N & W: E 
0 OEM fees 4 : E more | Matching: age, sex, 
(1950) (Wy or Nene with N frustrator education, S-E? status 


a —=Extrapunitive, I1=Intropunitive, M=Impunitive, GCR=Group Conformity Rating, N-P =Need-per 


istence, O-D =Obstacle-dominance, 
b S-E =Socioeconomic. 
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bleak trees, more concern with the 
house, and more organic signs. 
Disregarding the lack of estab- 
lished validity for the H-T-P, we 
consider the clinician’s ratings as too 
severe. Possibly, as Willoughby ob- 
served many years ago, what we call 
normality may only be a widespread 
case of arrested development; never- 
theless, the students on the whole 
in Hammer’s study were probably 
not neurotic according to ordinary 
standards. Should the H-T-P be 
valid, an investigation controlling 
for class and caste variables is called 
for rather than the type reviewed here. 
Among Letchworth Village mental 
defectives, it was found (Abel, 1943) 
that white and Negro girls, matched 
on relative level of IQ, more or less 
leadership ability, and other relevant 
variables, differed in that the Negro 
girls displayed more dominant be- 
havior in imposing their judgments 
upon the white girls and making the 
decisions, Boyd’s (1952) results, 
working with children of more or less 
normal intelligence in a nonsegre- 
gated elementary school, may be 
juxtaposed with the above. On two 
tests and a questionnaire designed to 
determine levels of aspiration, matched 
groups (age, IQ, economic status) 
revealed on the target test, question- 
naire, and on the arithmetic test, 
higher Negro t-o-a. In selecting 
‘the greatest Person in the world” 
and „the person I would like to þe 
like,” 24 of 28 persons chosen by the 
egro children were Negro 


compensatory acting o 
be explained at least in part on the 
basis that the Negro girls are not as 
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defective in reality as their white IQ 
counterparts (cf. the discussion 
elsewhere in this article on actual 
versus measured intelligence of low 
IQ Negroes). 

Gray (1944b) made a study of the 
wishes of Negro elementary school 
children and compared them with 
those of white children from a previ- 
ous study. The most striking point is 
the basic similarities of the two 
groups, although Negro children ex- 
pressed more wishes concerned with 
home, animals, and musical instru- 
ments. Doll play fantasies of Negro 
and white children may be inferred 
comparatively from an interracial 
comparison by Graham (1955). Indi- 
viduals ranged from 73-102 months 
and were regarded as homogeneous 
within groups, Although the results 
are difficult to evaluate interracially, 
it appears that the 30 Negroes pro- 

uced fewer total fantasies than the 
30 white children. With some excep- 
tions, both groups produced approxi- 
mately the same proportions of 
stereotype (dining room, kitchen, 
etc.) and nonstereotype (affection, 
aggression, etc.) responses. 

One other Study of children gives 
by implication some meager compar- 
ative information from a tempera- 
ment test. Anderson (1947) admin- 
istered two group intelligence scales 
the Otis and California Test of Mental 

aturity, and a group ‘‘personality 
test, the California Test of Personal- 
ity, to 153 Negro pupils in an Okla- 
homa high school, He found the 
averages on the intelligence scales at 
or above the means expected, but the 
Negro youth low in personal and 
social maturity on the CTP. The 
author does not mention the question 
of standardization of the CTP on a 
white population. The intelligence 
test results can be cautiously inter- 
Preted as running counter to the gen- 
eral findings elsewhere. 
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Illiterate soldiers revealed sectional 
differences on the Altus Adjustment 
Scale according to Altus and Clark 
(1949), With a higher mean signify- 
ing poorer adjustment, the following 
means were obtained in this study: 


Southern Negro 7.92 
Northern white 9.40 
Northern Negro 9.56 
Southern white 11.26 


An analysis of variance suggests 
that these differences are not the re- 
sult of chance factors. 

Contrary to Altus’ findings with 
his scale, Felton (1949), employing 
the Cornell Selectee Index with 148 
Negroes and approximately 2400 
whites at Oak Ridge, discovered that 
the former were significantly more 
frequently in the neurotic group. 
The author does not break down the 
figures, but a direct relation shows 
up in the data between lack of educa- 
tion and neuroticism. Although the 
Negro group is described as not from 
the deep South, there is a possibility 
that they were in the less well-edu- 
cated groups. Touchstone (1957) 
found that on a sentence completion 
test neither white nor Negro rated 
higher than the other in passivity, 
aggression, hostility, or withdrawal, 
on all of which the investigator had 
expected to find Negroes scoring 
higher. 


Overview of Temperament Studies 


Living in a white culture the Negro 
seems to have difficulty with his 
frustration-induced aggression, al- 
though some reversals (which may 
to be sure, be test artefacts) suggest 
that even under severe caste restric- 
tions aggressive drives may not be 
the most important problem in the 
Negro’s handling interracial interac- 
tions. It seems to be a truism that 
more and better research is called for 
in any area of investigation, but 
especially here where norms have 


scarcely begun to be established, 
where tests which have even failed to 
prove themselves with white subjects 
are applied almost unquestioningly to 
Negro subjects, and tests with white 
norms are utilized as if class and 
caste distinctions have no bearing on 
temperament or personality as a 
whole. 

So-called ‘‘personality tests” may 
be inappropriate for testing most 
Negroes who are different from whites 
in socioeconomic status (Auld, 1952; 
Hoffman & Albizu-Miranda, 1955) 
and caste. In this area intensive 
studies of whites and Negroes need to 
be performed by scientists who un- 
derstand both psychodynamics, so- 
ciodynamics, and adequate scientific 
procedure. 


VALUES AND ATTITUDES 


Various attempts have been made 
to measure the value systems and 
attitudes of groups of Negroes and 
whites. The studies to be reviewed 
in this section cover a wide range of 
topics and in some instances are re- 
lated only by the above broad head- 
ing. 

Value Systems 


The Allport-Vernon Study of Val- 
ues has been the most frequently 
employed single measuring instru- 
ment (Eagleson & Bell, 1945; Gray, 
1947; Milam & Sumner, 1954; Pugh, 
1951). Eagleson and Bell (1945) 
administered this scale to 164 South- 
ern Negroes and compared the results 
with the original data presented by 
Allport and Vernon. Subjects in the 
latter study were primarily from the 
Northeast. The authors report that 
the means were not greatly different, 
but the data were not sufficient to 
check the significance. Negroes and 
white females were similar in giving 
Religious values a top ranking, but 
differed in that the Negroes scored 
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lowest on Aesthetic values, while the 
white females gave this a second 
place rating. The authors felt that 
these findings could be explained on 
the basis of a cultural interpretation 
relating to the “suffering” of Negroes 
and the white female’s role in society. 
Gray (1947) compared these find- 
ings with the performance of whites 
from Peabody College and Florida 
State University. These subjects also 
gave Religious values the highest rat- 
ing with the Social scale being placed 
second. Both of these scales were sig- 
nificantly higher than all of the 
others. As compared with the norma- 
tive group these Southern females 
also placed a low emphasis upon 
Aesthetic values. Thus Gray rein- 
forces the conclusion of Eagleson and 
Bell that low Aesthetic and high Re- 
ligious scores are a function of cul- 
ture; this factor seems to apply also 
to Southern white women. Pugh 
(1951) administered the Allport-Ver- 
non to several groups (ministers, lay- 
men, nonchurch members) in south 
Georgia and reported more similar- 
ities than differences, Religious val- 
rating by all 
groups except white men who gave it 


ngs of a group 
of male Negro medical students were 


egro and Northern 
lue systems have been 

Studying classroom 
"hompson, 1954). 


, imi ority of values 
studied a similar amount of interest 


was reflected in the two groups, Ne. 
a eater emphasis to 
justice and had a weaker identifica- 
tion with family. 
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Somner and Killian (1954) investi- 
gated value differences by means of 
an attitude scale wherein the subjects 
rated the behavior of “a Negro.” Ne- 
groes want the Negro to be relatively 
forward, passionate, elegant, aggres- 
sive, persistent; whereas whites want 
the Negro to be relatively more witty, 
jovial, practical, quiet, patient. The 
emphasis should be on the word rela- 
tively as the range was restricted by 
the nature of the scale. 


Self-A ttitudes 


Within recent years there have 
been a number of studies investigat- 
ing the young Negro’s attitude to- 
Available evidence 
Negro 
early about racial 
differences and that a light skin is to 
be preferred, Marks (1942) had Ne- 


within their Own race there is a pref- 
erence for light coloring, although ex- 
tremely light skin was not preferred. 
of techniques have been 
employed in evaluating this factor, 
including the use of white and colored 
dolls, special Puzzles, picture tests, 
choice of playmates, etc, (Goodman, 
1952; Helgerson, 1943; Horowitz, 
1947; Koch, 1944; Landreth & John- 
Son, 1953; Radke & Trager, 1950; 
Radke, Trager, & Davis, 1949). 
Goodman (1952) reported finding a 
marked racial awareness in a group 0 
four-year-old Negroes and whites. 
Consistent patterns of response to 
different skin colors has been ob- 
served in children of three years of 
age (Landreth & Johnson, 1953). 
This study, as well as one by Radke, 
Sutherland, and Rosenberg (1950), 
indicates that the development of a 
negative self-concept has its begin- 
ning in the early childhood of the Ne- 
gro. When compared with whites the 
Negro is much less positive towards 
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his own race. Whites tend to express 
a strong preference for their own race 
throughout a wide age range, and 
when a variety of techniques have 
been employed. In at least one study 
the proportion of Negroes expressing 
a preference for Negroes increased 
with grade level in school (Koch, 
1946). Negroes have tended to prefer 
light skin colors, but a study among 
Fisk University students reveals that 
for at least that group the reference 
scale used in rating skin color is in 
part a function of the relative colors 
of the judge and the subject rated 
(Marks, 1942). 

In clinical interviews and autobiog- 
raphies of ‘someeighty Negro youths” 
ranging principally from 17 to 25 
years of age (plus some juvenile de- 
linquents and adult patients), Dai 
(1953) endeavored to determine those 
problems which are shared with white 
youth and those peculiar to Negroes. 
Clinically-oriented, Dai’s approach 
was in terms of role-self-concepts; his 
interest lay in part in tracing how self- 
attitudes developed in childhood re- 
late to adult problems. 

Dai cites illustrative case material 
elicited from Negro youth, which 
could be found in counterpart in most 
texts on adolescent psychology, in 
relation to reaction patterns within 
the primary group, the family: an- 
tagonism and open revolt, passive-ag- 
gressive reactions, extreme conform- 
ity, and so forth. Problems peculiar 
to Negro children, Dai concludes, are 
basically human problems, but col- 
ored by the fact of being Negro. 
Caste restrictions bring the following 
influences to bear on personality de- 
velopment: a preponderance of lower 
class families with their special (and 
to the middle class individual, im- 
moral) codes of conduct, broken 
homes, (consequent) maternal dom- 
inance, preoccupation with skin color 
and other physical features, and an 


extraordinary stress on social status. 
The factor most different from being 
a white youth, i.e., being a youth with 
dark skin, yields differential reactions 
in a sense of unworthiness, i.e., ab- 
sorbing the white’s evaluation of 
dark skin, in developing a sense of in- 
difference—‘‘So what?’’—or in iden- 
tifying with whites to the point of be- 
coming white in judgments of blacks. 

Controlled investigations of some 
of Dai’s conclusions are necessary, 
for his study suffers from the general 
defects of a posteriori reasoning of 
many clinical studies and from lack 
of the precision which even good clin- 
ical studies can have. His conclusions 
do, however, coincide with those 
reached in the American Council on 
Education summary of studies (Suth- 
erland, 1942) on ‘‘What does it mean 
for developing personality to be born 
a Negro?” 


Social Perceptions and Attitudes 


One study based upon choice of 
playmate in preschool children gives 
the racial factor a secondary weight- 
ing (Helgerson, 1943). The sex factor 
was considered the most important 
single determinant. It may be signifi- 
cant that this particular study was 
conducted in Minneapolis. In con- 
trast to the Koch study mentioned 
above, older children chose colored 
playmates less frequently in both 
Negro and white groups. 

Relatively detailed analysis of 
children’s social perceptions and atti- 
tudes may be found in Radke and 
Trager (1950), Trager and Yarrow 
(1952), Radke, Trager, and Davis 
(1949), Bird, Monachesi, and Bur- 
dick (1952), and Horowitz (1947). 
For the most part the results are con- 
sistent with the points mentioned 
above. However, this entire series of 
studies suffers from many of the 
methodological errors referred to else- 
where in this paper. In many in- 
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stances conclusions are based on what 
appear to be nonsignificant group dif- 
ferences; the race of the examiner is 
controlled infrequently; social class 
differences are seldom considered. 
Concern with patterns of prefer- 
ence and attitudes toward other ra- 
cial groups has also been reflected in 
a series of studies with adults. 
Studies prior to the period covered by 
this review suggested that patterns 
of preference of Negro and white col- 
lege students were similar. Prothro 
and Jensen (1952) administered the 
Grice-Remmers Generalized Attitude 
Scale to Southern white and Negro 
college students with results indicat- 
ing that the preferences of the two 
groups are not similar. Among other 
findings the authors report that the 
attitudes of the Negro towards the 
white were no more favorable than 
the reverse. Attitudes of the Negro 
towards Jews were generally fayor- 
able as were those of the whites 
towards Jews. Gray and Thompson 
(1953) report different findings on 
this latter point, On a modified 
Bogardus Social Distance Scale the 
Negro subjects voted all groups ex- 
cept their own lower than did the 
white subjects. The scale was ad- 
ministered to college and high school 


, with white groups being more 
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tween races were most pronounced 
with regard to social relations. The 
authors felt that the differences were 
becoming greater, primarily because 
of a shift on the part of the Negro. 
The questionnaire was administered 
again in the same schools in 1948 
(Mayo & Kinzer, 1950), with the 
finding of a more favorable attitude 
towards the Negro by both races. 
However, the two groups were further 
from agreement on issues involving 
interracial relations, and the shift to- 
wards a more positive attitude to- 
wards the Negro came primarily on 
items not implying social proximity. 

Holmes (1943), working with 
groups from four colleges reported 
the Negro more liberal in racial atti- 
tudes than the white and the not 
surprising finding that Negro stu- 
dents in’ the South were more con- 
servative than those in the North, 
with white Students in the South 
vastly more conservative than any 
of the other groups tested. Green- 
berg, Chase, and Cannon (1957) ad- 
ministered the California F Scale 
and an integration attitude scale to 
west Texas high school students. In 
this instance, the Negro was more 
authoritarian than the white, but the 
authoritarian attitudes of either race 
were not indicative of negative atti- 
tudes toward integration, 

Scholer (1943) had eighth grade 
students in Louisiana indicate their 
degree of approval of racial segrega- 
tion in hypothetical case situations. 
Total scores failed to reflect a signifi- 
cant difference between Negroes and 
whites, although the authors report 
significant differences in 7 of the 11 
situations presented, But the authors 
conclude that race similarities were 
more frequent than differences and 
there were no points of complete 
contradiction between the two 
groups. Preferences for segregation 
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exceeded those for nonsegregation 
among both whites and Negroes, but 
to a greater degree among whites. 

In a doctoral study Banks (1950) 
used a slightly different approach 
and had Ohio Negroes rank 96 differ- 
_ ent situations as to the degree of re- 
sentment experienced. The results 
were compared with Myrdal’s (1944) 
rank order of discriminations: (1) 
economic, (2) legal, (3) political, (4) 
access to public services, facilities 
and funds, (5) courtesies, (6) sex re- 
tions and intermarriage. Myrdal 
feels that these are reversed for 
whites. However, the subjects in 
this study gave the same rankings ex- 
cept that economic discrimination 
dropped to third place. Thus, it 
seems that for at least the popula- 
tion employed in this study white and 
Negro concerns are very similar. In 
a study of the influence of discrim- 
ination on minority group members 
in New York, a larger percentage of 
both Negroes and Jews reported ex- 
periencing discrimination more fre- 
quently than did Catholics or Prot- 
estants (Saenger & Gordon, 1950). 
When compared with the other eth- 
nic groups studied the Negroes par- 
ticularly felt discriminated against 
in terms of job opportunities. 

One final study dealing directly 
with attitudes towards segregation 
and schools should be mentioned. 
Turman and Holtzman (1955) sur- 
veyed a group of teachers in Texas 
on this question and found that only 
4% of the whites and 1% of the Ne- 
groes held out consistently for segre- 
gation at all levels of education. 
Forty-four percent of the whites and 
57% of the Negroes expressed com- 
plete approval of mixed classes in 
public schools. 

Attitudes toward other specific 
situations have been evaluated in 
several instances. Clarke and Camp- 
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bell (1955) had junior high school 
students estimate their Negro class- 
mates’ performance on objective tests. 
Negroes were significantly more ac- 
curate in estimating their fellow Ne- 
groes’ scores. The estimates of the 
whites were below those of the Ne- 
groes. The authors interpreted the 
data as reflecting a white stereo- 
type of low Negro ability. One man- 
ner in which this stereotype gets re- 
inforced may be shown in an anal- 
ysis of magazine pictures (Shuey, 
King, & Griffith, 1953). Pictures of 
Negroes in leading popular maga- 
zines were judged on the basis of the 
socioeconomic level thought to be 
reflected. Negroes were portrayed in 
a less favorable light than in reality 
and actually appeared in only 3 of 
1% of the total pictures. 

In Flint, Michigan, students were 
asked to predict teachers’ attitudes 
toward Negro children (Amos, 1952). 
In this instance the whites were more 
accurate than the Negroes who 
showed more prejudice, stronger feel- 
ings of rejection, and more conscious- 
ness of race. However, once again 
when social class was considered there 
was some evidence that this factor 
was more important than race in de- 
termining pupils’ attitudes towards 
teachers. In a relatively unsophisti- 
cated study by Lewis and Biber 
(1951) Negro children when given an 
opportunity expressed preference for 
white teachers; but those who had 
had a Negro teacher inclined to- 
wards choosing a Negro teacher. 

Attitudes towards the Negro tend 
to operate in a number of experi- 
mental situations and certainly have 
influenced the results obtained in the 
studies reported above, In addition 
to studies involving the use of psy- 
chological tests, the color of the in- 
vestigator has been shown to be a 
relevant variable in such diverse 
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things as work with GSR (Rankin & 
Campbell, 1955) and identification 
tests with young children (Trent, 
1954). 


Overview of Values and Attitudes 


Insofar as generalizations can be 
made from sheer weight, the evi- 
dence points to similarities in the 
value systems of whites and Negroes. 
Differences in self-concepts are 
marked, however, in that being a 
white person in a white society ap- 
pears to mean little in respect to the 
development of self-concepts, whereas 
being a Negro in a white society 
seems to be one of the most im- 
portant factors in such development, 

The quality of the studies reviewed 
in this section has varied consider- 
ably. Values and attitudes are diffi- 
cult to quantify, subject to regional 
variation and other factors which 
make generalization difficult. One 
obvious step in clarifying group dif- 
ferences requires the study of popu- 
lations other than those drawn from 
schools and colleges. 


VOCATIONAL I NTERESTS 


This is another area in which lim- 
ited research has been reported. A 
PhD thesis by Hartshorn‘ Purport- 
edly discovered interest scores on the 
Strong Vocational Interest Blank to 
be different for lawyers, physicians, 
and life insurance agents of the two 
groups, in that white men of the 


Scores, and also more masculine in- 
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parts, and fewer Specific likes than 
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compared are not equivalent (though 
the major reason Strong offers is not 
sufficient reason for his contention). 
Supporting Strong to a slight degree 
in his denial of the contention that 
interests of Negro and white profes- 
sional persons really differ is his re- 
search on medical school seniors of 
both races (Strong, 1955); the dif- 
ferences in Occupational and spe- 
cialization scales are in most cases 
relatively small. Yet out of 14 occu- 
pational scales one is significantly 
different and six are very signifi- 
cantly different for whites and Ne- 
groes, while three out of five spe- 
Cialization scales differ at least at 
the 5% level of confidence; Strong’s 
concluding remark in his analysis of 
Hartshorn’s data could conceivably 
do without the qualifying conditional 
clauses: “Tf Negroes are really dif- 
ferent from whites in their interests, 
which we are not ready to accept as a 
fact, then their interests must be 
measured from a Negro, not a white, 
Point of reference” (Strong, 1955). 

Milam and Sumner (1954) have 
provided further information on the 
Spread and intensity of vocational 
Interests of first year Negro medical 
students, They report that low vo- 
cational interest intensity in Negroes 
has been a consistent finding with the 
Strong Vocational Interest Blank 
and the controversy turns only on 
why it is lower, After comparing per- 
formance on the Strong with aca- 
demic grades, the authors conclude 
that there is some evidence to the ef- 
fect that high scholastic ability 1s 
Correlated with more intense physi- 
cian interest as well as more intense 
and/or diversified nonphysician in- 
terests, 

Gray (1944a) inquired directly of 
some 800 Negro children (first to 
sixth grades) as to what their voca- 
tional preferences were. Results were 
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compared with the responses of white 
children obtained six years earlier by 
Boynton. The Negro and white fe- 
males voiced similar preferences, but 
the Negro male was more interested 
in professional occupations than the 
white male. The median occupa- 
tional level chosen by the Negro ona 
five-point scale was one point higher 
than the white child’s. Data were 
not presented to the extent that sta- 
tistical significance could be evalu- 
ated, but it appears that the Negro 
children were certainly less realistic. 
Several studies contrasting Negro 
and white occupational patterns 
clearly point to the fact that jobs of 
higher quality generally are reserved 
for whites (Keenan & Kerr, 1952; 
Mundy, 1949; Turner, 1954). 


SOCIAL STRUCTURE 
Leadership 


Research on comparative qualities 
of leadership has been extremely lim- 
ited and, for the most part, has been 
concerned with the evaluation of mis- 
cellaneous factors. Dexter and Stein 
(1955) administered various tempera- 
ment tests to leaders and nonleaders 
in a white and Negro college, with 
the finding that differences within 
each group were greater than dif- 
ferences between groups. Some in- 
direct information of a comparative 
nature is available from studies of 
Who's Who in Colored America (Mon- 
ahan & Monahan, 1956; Valien & 
Horton, 1954). Relatively speaking 
it seems that the Negro female leader 
receives more recognition than the 
white female and achieves distinc- 
tion at a younger age than does the 
white person of either sex. 


Family Organization 


In this section various recent mis- 
cellaneous studies relating to Negro 
and white family practices will be re- 
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viewed. For comprehensive reviews 
the reader is referred to the general 
references listed in the beginning of 
this article. 

The dynamics of the Negro family 
have been discussed in some detail 
(Davis & Havighurst, 1946; Frazier, 
1939, 1957; Myrdal, 1944). For the 
most part, the emphasis has been 
placed upon family disorganization. 
Statistics have been quoted from the 
1950 census to point out the extent to 
which disrupting socioeconomic and 
cultural factors are operating (Ginz- 
berg, 1956). At that time one-third 
of the Negro women who had been 
married were divorced or separated 
from their husbands as compared 
with only one-fifth of the white wo- 
men. Thirty-five percent of the Ne- 
gro mothers under 45 were employed 
versus 19% of the white women in 
this same age group. In a theoretical 
discussion of the problems and needs 
of Negro youth, Frazier (1950) states 
that the male parent is absent in 
about 20% of the Negro homes. 
These statistics would appear to be 
relatively significant and confirm the 
existence of greater family disorgan- 
ization among Negroes, as in general 
is the case with lower socioeconomic 
groups (Hollingshead, 1953). If - 
there is any validity to currently 
held theories of personality develop- 
ment, we would expect some distinc- 
tive group personality differences as 
a reflection of the above facts. 

Several studies on child-rearing 
practices among Negroes and whites 
have been referred to under the sec- 
tion ‘on intelligence. Davis and 
Havighurst (1946), after studying 
groups of mothers from lower and 
middle classes in Chicago, concluded 
that essentially the same types of 
differences prevailed between middle 
and lower class whites as between 
middle and lower class Negroes. The 
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major exception was that Negro 
mothers were described as more per- 
missive than whites in feeding and 
weaning, but more rigorous in toilet 
training. Aside from this the authors 
conclude that the most striking thing 
about the study was that the Negro 
and white middle and lower classes 
were so much alike. 

There have been several compara- 
tive investigations of fertility rates. 
After studying the 1940 census fig- 
ures, Lee and Lee (1952) state that 
the pattern of Negro fertility is re- 
markably similar to that of whites. 
Within both races fertility declines 
as socioeconomic level goes up. As 
with whites, Negro fertility is lower 
outside of the South. The authors 
conclude that the patterns of the 
Negro most closely approach those 
of the white in those areas where he 
shares most freely in the general cul- 
ture. The rate is approximately the 
same in urban areas; any differences 
that do appear occur in rural sec- 
tions. Valien and Vaughn (1951) 
and Tietze and Lewit (1953), investi- 
gated birth control practi 


15 ces in two 
Southern communities and came up 
with expected findings: favorable 


attitudes were correlated with urban 


` birth, education, working mothers, 
etc, 


in; 
d 
r - By means of a question- 
naire Negro 
dents were 
factors. 


were drawn from differen i 


the country with differing religious 


backgrounds. Differences which did 
appear to be significant are the 
whites’ high rating of ‘insight and 
understanding,” and the low rating 
of “good health.” For the Negro 
group these two factors had stand- 
ings opposite to those of the whites. 
It is not too remarkable that the 
health factor is of concern to South- 
ern Negroes as their general level of 
health is considerably below that of 
the whites. 

Of incidental interest under this 
section are several studies of Negro- 
white marriages (Cash, 1956; Golden, 
1953, 1954). The subjects of all 
three of these investigations lived in 
Philadelphia, and thus generaliza- 
tions are limited; but the following 
findings were reported, In most such 
Marriages the male is the Negro; 
more than 50% of both parties had 

een previously married; they have 
few children and tend to marry late; 
most of the Negroes were Negroid in 
appearance, 

Summarizing the literature on the 
Negro family dealing with descrip- 
tive, theoretical, and quantifiable 
comparisons with white families, we 
are led to believe that many of the 
differences that have been reported 
can be accounted for in terms of 
Socioeconomic class differences. 


EMOTIONAL DISTURBANCES AND 
MENTAL ILLNESS 


Attempts to assess the relative 
emotional stability of Negroes and 
whites have ranged from the use of 
paper and pencil tests of neuroticism 
to comparisons of first admissions to 
state hospitals. The test approach 
has not made a significant contribu- 
tion to a clearer understanding of 
differences between the two groups, 
Primarily because there are few tests 
available which have been standard- 
ized on Negro populations. In an ex- 
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perimental study Roberts (1944) has 
shown that measures of attitudes, 
adjustment, and personality yield re- 
sults dependent upon the cultural 
and racial group in which the meas- 
ure is made; thus extreme caution is 
necessary when applying popular 
American tests to Negroes. It is also 
recognized that the use of hospital 
statistics is open to a number of er- 
rors. 

_Attempts to measure Negro-white 
differences in neuroticism have been 
limited primarily to college popula- 
tions. The studies reported have been 
contradictory. Heyman (1945) has 
observed that there is a tendency to 
ignore psychoneurotic behavior in 
the Negro, even though the observed 
Symptoms are very similar to those 
in whites. Boykin (1957) adminis- 
tered the Bell Adjustment Inventory 
to college freshmen over a four-year 
period and found 25% “poorly ad- 
Justed.” His findings were compared 
with the original standardization 
groups; of his two “poorly adjusted” 
groups, the Negro group was more 
maladjusted. It was not possible to 
tell from the statistics presented if 
this difference was statistically sig- 
nificant. The Bernreuter Personality 
Inventory has been administered to 
Negro groups on at least two occa- 
sions. Wheatley and Sumner (1946) 
found no differences between college 
student performance and the original 
norms. They concluded that the 
most neurotic scores were obtained 
from the lowest socioeconomic classes. 
Sumner (1948) on the other hand 
found no relationship between Bern- 
reuter scores and the socioeconomic 
status of Negro college women. The 
California Test of Personality has 
been administered to second and 
third grade Negro and white groups 
without obtaining striking differ- 
ences, although the author con- 


cluded that the minority group tends 
to feel persecuted (Engle, 1945). 
Rowntree (1943) studied dis- 
charges from the Service for psychi- 
atric reasons in the period 1941-43 
and concluded that a diagnosis of 
psychoneurosis occurs four times as 
frequently among Negroes as among 
whites. The sampling in this study 
leaves something to be desired; and 
such factors as geographical repre- 
sentation among the two groups were 
not controlled. In a study of 105 
consecutive discharges from the 
Navy a diagnosis of psychoneurosis 
occurred approximately three times 
as frequently among Negroes as 
among whites (Hunt, 1947). The 
author, however, checked these re- 
sults with four other stations and 
concluded that the prevalence of psy- 
choneurosis among Negroes is lower 
than among whites. Gardner and 
Aaron (1946) reviewed consecutive 
admissions to the psychiatric ward of 
a Naval hospital and concluded that 
whites were more prone to a psycho- 
neurotic reaction than Negroes. In 
seeming contradiction to these con- 
clusions Ripley and Wolf (1947) re- 
ported that minor psychiatric illness 
occurred two and one-half times more 
frequently among Negro troops over- 
seas than among comparable white 
troops. These results were felt to 
be due in part to lower standards of 
acceptance for service among Ne- 
groes. From these studies we are 
left wondering whether psychoneu- 
rosis actually is more extensive among 
Negroes or if standards of judging 
psychoneurosis differ for whites and 
Negroes. 
ma Te Pietu is clearer when sion 
admissions ae tenga” If first 
taken as a SROP hospitals Reo 
a ' measure of relative inci- 
ence of psychoses, the occurrence is 
approximately twice as great among 
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Negroes (Frumkin, 1954; Ivins, 
1950; Malzberg, 1940; Wilson & 
Lantz, 1957). This ratio seems to 
hold over a number of years and in 
widely separated states, with the re- 
ported range being from one-and-one- 
half to four-and-one-half times as 
great among Negroes. The only con- 
tradiction found was in a study by 
McLean (1949) who reported that 
first admissions to Illinois State Hos- 
pitals were no greater in the Negro 
than in the white. In explaining this 
finding he made note of the fact that 
Illinois is well established as an inte- 
grated state. 

Even when allowances are made 
for the fact that whites are more 
likely to be able to afford private 
hospitalization, it seems apparent 
that the relative incidence of psy- 
choses among Negroes is significantly 
higher. There are also some con- 
sistent differences in terms of psy- 


and symptoms. 


1 is that of 
Wilson and Lantz (1957), who found 


that the higher rate of first admis- 


Sions among Negroes in Virginia was 


psychoses (although there 
greater number of whites oy 
arteriosclerotic dementia, and 
phrenia. 

Other differences in symptomatol- 
ogy have been noted. Psychosomatic 
disorders are found more frequently 
among whites, and in some instances 
—e.g., peptic ulcers—the rate is as 
much as ten times greater (Rown- 
tree, 1943). The suicide rate seems 


was a 
er 65), 
schizo- 
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to differ according to geographical 
location. McLean (1949) reports 
that Negroes in the South have a rate 
only one-fourth that of the whites, 
whereas the ratio is almost equal in 
the North. She advanced the ex- 
planation that the higher rate among 
Northern Negroes was due to “ambi- 
tion.” There are also differences in 
“acting out” behavior which will be 
discussed in more detail below. In 
his studies of New York State Hos- 
pital, Malzberg (1956a) found that 
the admission rate for Negroes to 
hospitals for the criminally insane 
was four-and-one-half times that of 
whites, 

A number of hypotheses have been 
advanced to explain the higher inci- 
dence of emotional difficulties among 
Negroes, Various factors associated 
with Socioeconomic status have been 
of major interest in this regard. Pasa- 
manick, Knoblock, and Lilienfeld 
(1956) conclude that there is a 


Positive and Probably etiologic relationship 
between Socioeconomic status and prenatal 
and paranatal abnormalities which in turn 
are related, to retarded behavioral develop- 
ment and certain NP disorders such as cere- 


bral palsy, mental deficiency, and behavior 
disorders, $ 


Negroes were reported as having a 
much greater number of prematuri- 
ties and complications than white 
controls with the overall incidence of 
abnormal conditions being almost 
twice as great in the Negro. This ra- 
tio held up when socioeconomic sta- 
tus was taken into consideration, but 
since many Negroes were below the 
lowest white, socioeconomic status 
mas assigned greater weight than any 
racial” factors. Hollingshead and 
Redlich (1953) found in their New 

aven study that class is directly 
related to mental disorder and that 
schizophrenia Occurs most frequently 
in lower classes. This is one of the psy- 
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chiatric categories in which the 
Negro has consistently been reported 
as having a high incidence. 

Other factors felt to contribute to 
emotional illness in the Negro have 
centered in family structure, €co- 
nomic problems, and prejudice. 
Gardner and Aaron (1946) studied 
childhood and adolescent adjustment 
and found many similarities between 
white and Negro psychiatric casual- 
ties in a Naval hospital. However, 
the Negroes were more likely to have 
an unstable or “broken” home back- 
ground, and more likely to have been 
eneuretic in childhood. Ellis and 
Beechley (1950) compared Negro 
and white children seen at a child 
guidance center on a number of vari- 
ables and concluded that the Negro 
children were more disturbed, of 
Ower socioeconomic status, came 
more often from broken homes, and 
Were less responsive to treatment. 
Racial conflict was found in 32 of 45 
Consecutive Negro clients of a child 
guidance clinic and was felt by the 
investigator to be a contributing 
factor in the emotional problems of 
the children (Verin, 1944). An at- 
tempt to evaluate cultural factors in 
mental illness is reported by Cer- 
vantes (1954) who studied 30 Nor- 
thern born Negro patients and 30 
“average’’ whites. | Socioeconomic 
differences between the groups were 
felt to be important, but the design 
of the study was such that the differ- 
ences were not clearly demonstrated. 
In studying first admissions in state 
hospitals Malzberg (1956b) reported 
a higher rate among New York Ne- 
gro males than among foreign born 
Negro males. Prejudice was sug- 
gested as an explanation for this dif- 
ference; but the same ratio did not 
hold for females, a fact which weak- 
ens his argument. In another study 
of first admissions to state hospitals 


Malzberg (1956a) reports that for 
the most part Negroes tend to follow 
the overall pattern in that single, 
divorced, or widowed individuals 
have higher rates ‘than married. 

Several papers have appeared de- 
scribing the difficulty in treating psy- 
chiatric problems in the Negro as op- 
posed to whites (Adams, 1950; Harms, 
Kobler, & Sweeney, 1945; Heine, 
1950; Kennedy, 1952; St. Clair, 
1951). In describing the Negro the 
following features are frequently 
mentioned: concern with race con- 
sciousness, tendency to act out, hos- 
tility as a dominant problem, dis- 
trustful, self-hating, strong prestige 
needs, difficulty in establishing rap- 
port. The referenecs cited are not 
experimental studies but clinical im- 
pressions that are frequently re- 
ported and of unknown validity. But 
there have been several attempts to 
quantify Negro-white differences in 
response to treatment. Most of these 
studies reflect a greater likelihood of 
improvement in the white. Blas- 
singille (1955) compared the rehabil- 
itation of 70 Negro leucotomy pa- 
tients at a VA hospital with that of 
similar patients reported in the litera- 
ture and considered the results sim- 
ilar. Cultural, social, and economic 
factors peculiar to the southern Ne- 
gro were felt not to be significant in 
rehabilitation of this type patient. In 
other studies whites have been found 
to respond more favorably to treat- 
ment of varying kinds. 

An a review of 455 cases treated 
ie ok therapy, the whites 
a N N 3 cigurbcantly mote 

n the criteria em- 
poy were rate of discharge from 
Kahn Bumen > Hen. ERI 
describe. a es eller, and Gildea (1951) 
ioe us Program of group therapy 

parents of behavior problem chil- 
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dren which failed when employed 
with Negroes where it had been con- 
sidered successful with white parents. 
The adjustment of Negro and white 
schizophrenics was studied by Hew- 
lett (1946) with a conclusion that the 
whites were making the better ad- 
justment. She concluded that eco- 
nomic, financial, occupational, and 
familial problems were the factors 
most often related to the adjustment 
made and that these factors were 
most severe among Negroes. Ellis 
and Beechley (1950) feel that some 
of the same factors were operating to 
cause a poorer response to treatment 
among Negro children seen in a child 
guidance clinic. 

For the most part these attempts 
to evaluate objectively the response 
to treatment have resulted in conclu- 
sions in agreement with clinical im- 
Pressions reported in the literature. 
More definite evidence could not be 
expected in terms of the absence of 
well accepted criteria of improve- 
ment as well as the many extrane- 
ous variables operating simultane- 
ously, In evaluating research per- 
taining to psychiatric disorders amon 
Negroes, Schermerhorn (1956) com- 
ments that investigators have fre- 
quently been j 
that the N 


and large, also, 
offered by white 
patients. What effect this fact 


I W or has 
in determining results has not been 
assessed. 

On the b 


asis of the evidence avail- 
able it does appear that the N 
more frequently experience psychi- 
atric difficulties, particularly of a 
severe nature. For the most part the 
specific forms and course appear to 


egroes 
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be similar with the differences at- 
tributable at least in part to identifi- 
able socioeconomic and cultural fac- 
tors. Whether genetic endowment ac- 
counts for the unknown variance is 
difficult to tell. However, the trend 
in recent years to find many factors 
of previously unquestioned heredi- 
tary nature attributable to experi- 
ence makes it also reasonable to as- 
sume environmental causation here. 
Thus, the facts concerning emotional 
disturbances are fairly clear, but 


their interpretation is far from being 
settled. 


CRIME AND DELINQUENCY 


Statistics reporting the incidence 
of crimes and delinquency have con- 
sistently shown a higher prevalence 
among Negroes. The few studies 
that have appeared in the literature 
during this period seem not to have 

een particularly concerned with the 
question of incidence per se, but have 
explored possible causative differ- 
ences between the two races and dif- 
ferential treatment by police officials 
and the courts, 

On the basis of a questionnaire 
completed by Philadelphia police- 
men, Kephart (1954) concluded that 
both white and Negro patrolmen are 
more strict with Negro offenders. He 
also noted that Negro offenders more 
frequently resisted arrest and there- 
fore reinforced the attitude of the 
officers. Moses (1947) studied crime 
rates in Baltimore within four socio- 
economically equated areas and found 
in two of them that the Negro rate 
was higher, He did not find evidence 
that the Negro offender is convicted 
more readily, The pattern of offense 
Was similar except that crimes involv- 
ing loss of life were concentrated 
among Negroes. The author had 
Some reservations about the attempt 
to equate the areas in terms of socio- 
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€conomic status and found that as a 
whole the whites had been settled 
much longer, A greater percentage 
owned their homes, etc. 

Pollack (1944) reviewed Negro 

and white admissions to Pennsyl- 
Vania State prisons in 1941—42, being 
concerned only with subjects who 
were aged 50 or over. The Negro 
was charged more frequently with 
aggressive assault, criminal homi- 
cide, and liquor law violations. Pol- 
lack reported that the decrease of in- 
cidence of various crimes at age 50 
and over was comparable in the two 
groups. The significance of the dif- 
erence reported could not be evalu- 
ated as the data were not reported 
1n the article. 
_ On the basis of statistics compiled 
in Philadelphia in 1948, Diggs (1950) 
found several significant differences 
in both offenses and disposal of of- 
fenders between children of the two 
races, A smaller number of Negro 
children were dismissed or discharged 
and a larger number were institu- 
tionalized or referred to a criminal 
court. The Negro was less likely to 
be referred to private agencies for 
treatment as opposed to public agen- 
cies. The leading offense for Negro 
boys was that of taking the property 
of another as opposed to carelessness 
or mischief for non-Negro boys. Sex- 
ual offenses led for Negro girls, while 
white girls came to the attention of 
the court most frequently for running 
away from home. Diggs reported 
that only one-fourth of the Negro de- 
linquents had both natural parents 
in the home, but the comparable fig- 
ure for whites was not reported. 

Axelrad (1952) studied the records 
of 300 institutionalized delinquents 
in New York City. He concluded 
that in comparison with whites, Ne- 
gro children were committed younger, 
for less serious offenses, with fewer 


previous court appearances, they 
came from more unstable homes and 
homes with a different kind of pathol- 
ogy than that of white delinquents. 

The Negro crime rate seems to be 
consistently higher when a single 
overall breakdown is made by race. 
On the basis of recent comparative 
studies it is difficult to make any 
judgment as to the relative weights 
of heredity and environment in this 
area. It is the writers’ opinion that 
only environmental factors are re- 
sponsible for differential criminal or 
delinquent behavior. But if we were 
to depend on the studies reported 
here, we must frankly admit we 
should have little basis for the opin- 
ion. 


SUMMARY AND CONCLUDING 
REMARKS 


In psychophysical and psycho- 
motor functions, differences appear 
between whites and Negroes which 
may not be accounted for by dif- 
ferential environmental conditions. 
However, a tendency is present in the 
literature to indicate that most differ- 
ences of this nature may be leveled 
off when social and economic vari- 
ables are controlled. Intelligence dif- 
ferences reported in the period cov- 
ered by this review are in the same 
direction as those seen previously, al- 
though infant and young child com- 
parisons suggest greater similarities 
between Negroes and whites in the 
early years. Educational achieve- 
mos car relative to whites 

a pattern of intellectu 
differences. 

In temperament (“personality”) 
studies Rorschach, TAT, PAT. and 
P-F Study differences are found, but 
once again there is insuffici i- 
deta to detan, cient evi 
tributions of ge lps — 
niche netic constitution and 

- At least in those reac- 


al 
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tions which indicate responses to a 
dominant-group culture, experience 
seems to be the major if not sole de- 
terminant. Overall, likenesses in 
psychodynamics appear more ex- 
tensive than differences. Some, es- 
pecially paper-and-pencil, tests im- 
ply that Negroes are more neurotic 
than whites, although these tests may 
be limited for cross-barrier compari- 
sons. 

Religious values are ranked first by 
Negroes, a condition holding gen- 
erally for white females also, but not 
for white males. Self-concepts seem 
to suffer in the Negro subculture in 
contrast to those of whites. Social 
perceptions correspondingly differ 
from one group to the other, with, 
however, a number of likenesses 
which may not fit into the stereo- 
type the social scientist holds, for ex- 
ample, in attitudes toward segrega- 
tion where some Negroes maintain 
expressed attitudes very much like 
those of the white Majority, or in 
rank-order of attitudes toward dis- 
crimination. Vocational interests 
may be somewhat similar though less 
intense (by Strong’s criteria) among 
Negroes; Negro children may be less 
realistic. 

As might be expected, in the areas 
of psychological functioning most 
closely related to the sociological, so- 
cial class differences show up more 
clearly as bases of differentiation be- 
tween the two groups. Leadership, 
family life, child-rearing practices, 
fertility, and mate selection all 
to conform to soci ihe T 

„to social structure rather 
than to racial lines per se. 

Mental illness is considerably more 
prevalent among Negroes than whites 
even though specific symptoms like 
psychosomatic disorders are more 
frequent in the white group. For 
various reasons advanced in the body 
of this article treatment of psychi- 


atric disturbances is more difficult 
with Negroes than with whites. 
Rates of delinquency and crime are 
reported to be higher for Negroes, 
especially in regard to violent crimes. 
Most of the evidence indicates that 
Negroes are given less adequate 
treatment at the hands of officers of 
the law and courts. 

It is clear from the foregoing re- 
view that (a) there are still wide dif- 
ferences between Negro and white in 
many areas of psychological func- 
tioning and (b) a number of differ- 
ences attributed in times past to 
heredity have been shown to be the 
result of social class determination. 
It is not clear whether some dif- 
ferences adumbrated here, specifi- 
cally in the intelligence and tempera- 
ment realms, are genetically based 
or not. We agree with Garrett in his 
Foreword to Shuey’s (1958) book, 
that there are some wholly well- 
meaning persons who hold that 
‘“... racial differences ought not to 
be found; or if found, should immedi- 
ately be explained away as being 
somehow immoral and reprehensi- 
ble,” Nevertheless, we are not satis- 
fied that either those who like Gar- 
rett believe that genetic differences 
exist in psychological functions oF 
those who maintain that no such 
differences can be found have suc- 
ceeded in establishing their position. 
Most students in the period of this 


review have leaned to environmen- ' 


talist explanations. But this concur- 
rence of judgment may be the result 
of a possibly unjustified extrapola- 
tion from Point b above. We do 
not agree with Garrett when he says 
in the same Foreword: ‘The honest 
Psychologist, like any true scientist, 
has no preconceived racial bias.” AS 
clinical Psychologists we are con- 
vinced that scarcely anyone under- 
takes investigation in this field with- 


— 
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out preconceived biases. We frankly 
are environmentalist in our bias; but 
we also hope that we are “honest psy- 
chologists” enough to recognize that 
many research results can yet be in- 
terpreted from an hereditarian view- 
Point without doing violence to them. 

As a last note, from our survey we 
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have come to the conclusion that re- 
search within the United States, to 
which we have limited this review, 
must be supplemented by investiga- 
tions between American and other 
cultures and within other cultures 
where caste differences are relatively 
nonexistent. 
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In terms of productivity during the 
past decade, few areas of study in 
psychology have matched the output 
of research on scales of anxiety. 
While the inundation of papers on 
anxiety has impressed some workers 
and troubled others, it behooves us to 
inquire into (a) the stimulus value of 
anxiety scales for psychologists, (b) 
the contribution of research on anx- 
iety to the body of psychological 
knowledge, and (c) the problems for 
future study raised by this research. 
It is the purpose of this paper to at- 
tempt such an evaluation with em- 
Phasis on the relationship of anxiety 
to stress, learning, intelligence, phys- 
iological responses, other personality 
characteristics, and test taking atti- 
tudes. The purport of the paper is 
not to present a general review of all 
studies on anxiety in these areas but 
rather to attempt to abstract from a 
large literature major trends which 
seem of present or potential signif- 
icance. 

An attempt such as the present one 
seems particularly appropriate in 
view of several recent evaluations 
of research involving anxiety scales 
which noted the unreplicability and 
inconsistencies of certain reported 
findings in this field (Bendig & 
Vaughan, 1957; Blake & Mouton, 
1959; Farber & Spence, 1956; Jen- 
sen, 1958; McClelland, 1958). Frus- 
trating as this state of affairs may 
be, the present writer will attempt 
to show that unreplicability is not 


1 The preparation of this paper was facili- 
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necessarily attributable to unreli- 
ability in the anxiety measuring in- 
struments, but rather, to several 
“traditional” variables such as char- 
acteristics of Ss and Es, and popula- 
tion and instructional variables which 
confound with anxiety measures. 


Tur STIMULUS VALUE FOR THE Psy- 
CHOLOGIST OF ANXIETY SCALES 


In view of the centrality of the 
concept of anxiety in personality 
theory, it is somewhat surprising 
that attempts to measure the concept 
objectively have developed only in 
recent years. Also, psychologists 
concerned with personality func- 
tioning might well be surprised at 
the context in which the first widely 
used anxiety scale was developed. 
A group of experimental psycholo- 
gists interested in problems of learn- 
ing was responsible for the develop- 
ment of Taylor’s Manifest Anxiety 
Scale (MAS) (Farber, 1955; Taylor, 
1951, 1953, 1956). The main interest 
of these researchers in the MAS was 
in the measurement of Hull’s D in 
human Ss who were being studied in 
learning situations. 

Whereas the work stemming from 
the Iowa laboratory was 
with the relationshi lege 
other researcl “ees ‘ MRS te 

archers have inquired into 

the relationship between anxiety 
measures and a host of varied oe 
ik and Situations (Eichhorn & 

racktir, 1955; Erik: & 

1955; Fiedler D d n w 
ite, 1058; Jania Ieee 
198%; Rosen ann tine cnt tae 
Taft 1957. Wee 1956; Siegal, 1954; 
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1955). Motivated by the need for 
measures of personality relevant to 
such variables as intellectual per- 
formance, reaction to stress, and 
ability to learn, psychologists seized 
upon the objective, easily adminis- 
tered MAS. In view of the absence 
of measures of individual differences 
in anxiety, the motivation underly- 
ing the swift adoptions of the MAS 
seems clear. However, the criticism 
of Jenkins and Lykken (1957) that 
in some research projects involving 
the MAS the rationale for its use 
has been lacking seems to be a just 
one. 

The availability of the MAS served 
to stimulate its use by researchers 
with varied interests, and it has also 
encouraged other investigators to 
construct other measures of anxiety 
better fitted to their specific needs 
(Bendig, 1956; Dixon, deMonchaux, 
& Sandler, 1957; Lykken, 1957; 
Mandler & Sarason, 1952; Sarason, 
1958b; Welsh, 1952, 1956). As a re- 
sult, measures for Specific anxieties 
such as test anxiety, social anxiety, 
and anxiety in children are now read- 
ily available, There is reason to 
believe that the Various measures of 
anxiety in current use are not all 
measuring the same thing (Feldman 
& Siegel, 1958; Goodstein, 1954; 
Gordon & Sarason, 1955; Jackson & 
Bloomberg, 1958; Lauterbach, 1958; 
Sarason, 1959a; Sinick, 1956; Windle. 
1955; Zimet & Brackbill, 1956). An 
iMportant current Problem is the 
clarification of the similariti 


ig mil es and 
differences among existing anxiety 
indices. In this connection, Jessor 


and Hammond (1957) 


ance of 


anxiety scales 
may be numbered and that more 
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concern will be given to the theoret- 
ical bases underlying the use of par- 
ticular measures of anxiety. 


RELATIONSHIP OF ANXIETY 
TO BEHAVIOR 


As has already been indicated, 
existing studies of anxiety literally 
defy summary asa unit. However, it 
is possible to discern trends and prob- 
lems in certain areas where anxiety 
scales have been employed, and it 1s 
with these that this paper will be 
concerned, 


ANXIETY AND STRESS 


Many investigators have studied 
the reactions of Ss differing in scores 
on anxiety scales to situations posing 
Personal threat or stress for Ss. 
Typically, the stress has been created 
by means of verbal instructions, e.g. 
informing She is about to take an 
intelligence test. Most investigators 
have assumed that high anxious SS 
would be more sensitive to implied 
Personal threat than would low 
anxious Ss, 

Although some investigators (Cox 
& Sarason, 1954; Farber & Spence, 
1956; Gynther, 1957; Taylor, 1958) 
have presented evidence not con- 
sistent with this assumption, the 
bulk of the available findings suggest 
that high anxious Ss are affected 
more detrimentally by motivating 
conditions or failure reports than are 
Ss lower in the anxiety score dis- 
tribution (Davidson, Andrews, & 
Ross, 1956; Gordon & Berlyne, 1954; 
Korchin & Levine, 1957; Lucas, 
1952; Mandler & Sarason, 1952; 
Nicholson, 1958; Sarason, 1956a, 
1957a, 1957b, 1959b, 1959c: Sarason, 

andler, & Craighill, 1952; Sarason 
& Palola, 1960; Truax & Martin, 
1957; Westrope, 1953). Illustrative 
of this type of study is that of David- 
Son, Andrews, and Ross (1956) in 
which three variables were studied: 
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(a) MAS scores, (b) reports to Ss 
of levels of failure, and (c) speed 
of presentation of task stimuli. Sig- 
nificant interactions were obtained 
among all of the variables, and the 
authors concluded that high anxious 
Ss are more sensitive to experimental 
stress than are low anxious Ss. 

In this connection it is interesting 
to note that high anxious Ss have 
been found to be more self-depreca- 
tory, more self-preoccupied, and gen- 
erally less content with themselves 
than Ss lower in the distribution of 
anxiety scores (Bendig, 1958; Co- 
wen, Heilizer, Axelrod, & Sheldon, 
1957; Doris & Sarason, 1955, Fiedler, 
et al., 1958; Holtzman & Bitterman, 
1956; Holtzman, Calvin, & Bitter- 
man, 1952; Trapp & Kausler, 1958; 
Westrope, 1953; Wolf, 1955). It may 
well be that highly motivating or ego- 
involving instructions serve the func- 
tion of arousing these self-oriented 
tendencies. One recent study (Sara- 
son, 1958a) has shown that Ss scor- 
ing high in test anxiety respond 
more positively to reassurance in an 
experimental situation than do low 
anxious Ss. A worthwhile problem 
for future research would seem to be 
the development of techniques for 
the extinction rather than the arousal 
of anxiety responses. 

Consistent with the interpretation 
of anxiety measures as indicators of 
sensitivity to implied personal threat 
is the finding by several investiga- 
tors that there are no differences 
among groups differing in scores on 
anxiety scales when tested under 
neutral and apparently nonthreaten- 
ing conditions (Axelrod, Cowen, & 
Heilizer, 1956; Sarason, 1956a, 1957a, 
1957b; Silverman & Blitz, 1956). 
Sarason, in a series of three experi- 
ments (1956a, 1957a, 1957b) involv- 
ing the effects of anxiety and ex- 
perimental stress on verbal learning, 
failed to find under pre-experimental 
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neutral conditions significant dif- 
ferences in performance between 
groups which differed in anxiety, al- 
though varying performance was ob- 
tained under later conditions of 
personal threat. This suggests a sensi- 
tivity interpretation of anxiety sim- 
ilar to the one offered by Davidson, 
Andrews, and Ross (1956). Further- 
more, evidence recently reported 
suggests the possibility that the more 
directly related the content of items 
on the anxiety scale is to the situa- 
tion in which Ss are to perform, the 
more useful is the measure of anxiety 
in showing interactions between 
scores on the scale and differential 
motivating instructions (Raphelson, 
1957; Sarason, 1958a, 1959a, 1959c; 
Sarason & Palola, 1960). 

The results of studies on anxiety 
and stress have led to what might be 
called a habit interpretation of anx- 
iety (Child, 1954; Davidson, et al., 
1956; S. A. Mednick, 1957; Nichol- 
son, 1958; Sampson & Bindra, 1954; 
Sarason, 1958b, 1959a). This inter- 
pretation, briefly put, states that Ss 
scoring high and low in anxiety dif- 
fer in the response tendencies acti- 
vated by personally threatening con- 
ditions. Whereas low scoring Ss may 
react to such conditions with in- 
creased effort and attention to the 
task at hand, high scoring Ss respond 
to threat with self-oriented, person- 
alized responses. More information 
is needed to clarify the conditions, 
e a 
ciated with the development vk ESO 
ened responsiveness on kagit 
rapidly burgeoni = ae — i 
as amen pe interest in the 
can be helpful i ee n n 
neda, McCandl me A (Casta- 
McCandless —_ = Palamo 1450} 
1956: Saraan, Tete a 
& W ; on, Davidson, Lighthall, 

_ Waite, 1958; Waite, Sarason, 
Lighthall, & Davidson, 1958). 
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A neglected problem in the crea- 
tion of experimental stress situations 
is that of the E as an agent in creat- 
ing a threat to S. Even when quite 
explicit motivating instructions are 
administered to S, there remains the 
problem of the administration of 
these instructions. The problem of 
variance among Es in the manner 
with which instructions are commu- 
nicated cannot be overemphasized. 
Systematic study is needed of the 
relationship between Æ variables 
such as sex and personality of E and 
anxiety aroused. 


ANXIETY AND TASK VARIABLES 


As has already been mentioned, the 
originators of the MAS considered it 
to be a measure of drive, D, and were 
primarily interested in relating it to 
the concept of the response hierarchy. 
In simple, one-response situations 
such as eyelid conditioning, it was 
predicted that high anxious Ss would 
perform at higher levels than would 
low anxious Ss, However, as the 
complexity (e.g., intralist similarity) 
of the task to be learned increased, a 
superiority of low to high anxious Ss 
was expected. 

A number of the studies conducted 
within this framework have supported 
these assumptions (Farber & Spence, 
1953; Montague, 1953; Ramond 
1953; Spence, Farber, & McFann, 
1956; Taylor, 1951; Taylor & Spence, 
1952), For example, Montague 
(1953) compared high and low anx- 
lety groups in ability to learn lists 
of nonsense syllables which differed 
in association value and intralist 
similarity. A significant interaction 
was obtained with low anxious Ss su- 
perior to high anxious Ss on the most 

complex or difficult task. On the 
least complex task, high were superior 
to low anxious Ss. These findings, al- 
though subject to alternative inter- 
pretations, are in accord with Hullian 
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expectations. Farber (1955) and 
Taylor (1956) have presented sum- 
maries and analyses of work on 
anxiety from a drive point of view. 

Despite these positive findings, a 
review of the literature also reveals a 
number of other studies either con- 
tradictory to or not consistent with 
a drive interpretation of anxiety 
(Axelrod, et al., 1956; Bindra, Pater- 
son & Strzelecki, 1955; Deese, Laz- 
arus, & Keenan, 1953; Heilizer, Axel- 
rod, & Cowen, 1956; Kamin & Clark, 
1957; Kamin & Fedorchak, 1957; 
Saltz & Hoehn, 1957; Silverman & 
Blitz, 1956). Several of these studies 
were specifically designed to test pre- 
dictions from Hullian theory concern- 
ing the performance on either simple 
or complex tasks of groups scoring 
high and low on anxiety scales. An es- 
Pecially interesting experiment was 
performed by Bindra, Paterson, and 
Strzelecki (1955). They did not ob- 
tain significant differences between 
high and low anxious Ss in simple 
conditioning. It is interesting that 
their situation involved a nondefen- 
Sive response rather than the defen- 
Sive one used in many drive studies. 
The threatening aspects of receiving 
puffs of air in the region of the eye 
may be much more crucial in affect- 
ing performance than the lack of re- 
sponse hierarchy competition hypoth- 
esized for such one-response situa- 
tions (Hilgard, Jones, & Kaplan, 
1951; Kamin, 1955). It seems likely 
that for certain tasks there exists 2 
confounding of task simplicity with 
task stressfulness. | Korchin * and 
Levine (1957) have actually inter- 
Preted complexity of the learning 
Situation not so much as a task vari- 
able but as a stress variable. Kausler 
and Trapp (1959) have recently pre- 
sented a critique of the drive inter- 
Pretation of anxiety which discusses 
other problems along these lines. 

One particular problem suggested 
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by studies of anxiety in which com- 
plex tasks are used is this dual aspect 
of task complexity. A complex task 
can be difficult and at least poten- 
tially threatening to S. Under what 
conditions either or both of these as- 
pects of task complexity are operative 
has as yet not been systematically 
studied. Certainly a closer tie-in 
between studies of anxiety and stress 
and studies of anxiety and task fac- 
tors seems indicated. 

In an attempt in this direction, 
Sarason and Palola (1960) manipu- 
lated simultaneously the variables of 
anxiety, differential motivating in- 
structions, and task complexity in 
three experiments. Significant triple 
interactions involving the three vari- 
ables studied were obtained in every 
case. These results are in accord with 
the dual properties of task complexity 
already mentioned. They seem to 
suggest the necessity of developing 
an integrated interpretation of anx- 
iety in terms of the experimental 
conditions most detrimental to the 
performance of high anxious Ss. For 
example, the combined use of high 
threat and high complexity of task 
might lead to larger differences in 
performance between high and low 
anxious Ss than the manipulation of 
either threat or complexity alone. A 
study by Taylor (1958) illustrates 
the need for this type of research, and 
Nicholson (1958) has recently pre- 
sented findings consistent with this 
formulation. 

In addition to needed advances in 
theory in integrating the anxiety, 
motivational, and task variables, it is 
imperative that theories of anxiety 
also incorporate such variables as the 
sex of Sand E. This was suggested 
by Kamin and Clark (1957) and has 
been most dramatically illuminated 
by the results of a group of re- 
searchers at the University of Roches- 
ter (Axelrod, et al., 1956; Heilizer, 
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et al., 1956). These workers have 
consistently shown significant inter- 
actions between (a) anxiety scores, 
(b) sex of S, and (c) E characteristics. 
These latter two variables related 
more powerfully to anxiety of Ss than 
did task complexity, the primary 
focus of their research. As these au- 
thors point out, psychological theory 
has failed to deal systematically with 
the Sand E variables. 


ANXIETY AND INTELLIGENCE 


Although several investigators 
have reported negative relationships 
between MAS scores and intellectual 
performance for certain S popula- 
tions (Grice, 1955; Kerrick, 1955; 
Matarazzo, Ulett, Guze, & Saslow, 
1954; Siegman, 1956a, 1956b; Spiel- 
berger, 1958), the majority of studies 
relating measures of general anxiety 
to measures of intellectual perfor- 
mance have yielded nonsignificant 
correlations (Dana, 1957; Davids & 
Eriksen, 1955; Goodstein & Farber, 
1957; Jackson & Bloomberg, 1958; 
Klugh & Bendig, 1955; Matarazzo, 
1955; Sarason, 1956b, 1959a; Schulz 
& Calvin, 1955; Taylor, 1955). 

Whether one should infer that high 
anxious Ss are less bright than other 
Ss when significant negative correla- 
tions between anxiety and intellec- 
tual performance are obtained de- 
pends on the interpretation placed on 
anxiety scales. The finding that 
under stressful conditions low anxious 
Ss perform at higher levels than high 
anxious Ss, and under nonstressful 
conditions high and low anxious Ss 
perform equally well, might suggest 
ee labeling a test as an intelligence 
Selec ee 
high eines a ni et responses 1n 
sea ect a which interfere with 
and situation t vari Motivational 
Ar ei te variables associated 
ih g have not yet been manip- 

ystematically in studies 
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which attempt to relate anxiety and 
intelligence. 

As was indicated earlier it would 
appear that, for college students, 
tests of the ACE type are unrelated 
to, or only very slightly related to, 
measures of general anxiety such as 
MAS. However, studies which have 
related test anxiety, i.e., anxiety ex- 
perienced in test situations, to meas- 
ures of intellectual performance have 
shown consistent negative correla- 
tions. The Ss scoring high in test 
anxiety obtain lower performance 
scores than Ss with lower scores 
(Cowen, 1957; Mandler & Cowen, 
1958; Sarason, 1957c, 1959a; Sarason 
& Mandler, 1952). In one study 
(Sarason, 1959a) in which both gen- 
eral and test anxiety indices were 
used, it was found that test anxiety 
correlated negatively with several in- 
tellectual measures for both male and 
female college students, but measures 
of general anxiety and other per- 
sonality variables were unrelated to 
intelligence, 

An important problem in the study 
of the correlation between anxiety 
and intelligence which has not been 
given enough emphasis is that of the 
range of intellectual ability studied. 
Ifrestricted ranges of ability are used, 


it will make it less likely that signif- 
icant correlations will 
though investi 


of restricted 
ı college students, air 

student nurses), no 
mpts have been made 


s relationships between 
anxiety and intelligence in different 


populations using similar measures of 
anxiety and intelligence in all com- 
parisons. Spielberger (1958) and Cal- 
vin, Koons, Bingham, and Fink 
(1955) have presented evidence which 
strongly suggests the need for such a 


force recruits, 
systematic atte 
to study the 
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systematic consideration of sampling 
variations. 


ANXIETY AND PHYSIOLOGICAL 
VARIABLES 


As anxiety is defined clinically, it is 
typically assumed that it has im- 
portant physiological correlates. On 
the basis of assumptions of this type, 
several investigators have sought re- 
lationships between anxiety and a 
variety of physiological measures 
(e.g., GSR). Although work in this 
area seems only to be getting under 
way, the results to date have been 
largely negative. Measures of ques- 
tionnaire-defined anxiety such as 
MAS do not seem to relate consist- 
ently to Physiological responding 
(Beam, 1955; Berry & Martin, 1957; 
Calvin, McGuigan, Tyrrell, & Soyars, 
1956; Lotsof & Downing, 1956; 
Raphelson, 1957). Although these 
negative findings can be taken as 
reflecting Poorly on the validity of 
MAS-type scales, it may also be that 
these scales are tapping aspects of 
anxiety other than autonomic func- 
tioning. It is known that there are 
marked individual differences among 
Ss in their Physiological response 
Patterns under stress conditions 
(Lacey, 1950; Lacey, Bateman, & 
Van Lehn, 1953). Consequently, in 
research relating anxiety and auto- 
nomic response, it would seem desit- 
able to study patterns of physiolog- 
ical responding rather than only 
one physiological response measure. 

Another important variable as 
yet unstudied in this area relates to 
the conditions under which Ss 
Physiological responses are measured 
Martin & McGowan, 1955). The 
Situational and experimental condi- 
tions under which an hypothesized 
relationship should be present or not 
Present have not been explored. It is 
known that even patients diagnosed 
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as anxiety states do not display 
anxiety symptoms at all times and 
do not always show the same pat- 
terns of symptoms. Just as the habit 
interpretation of anxiety would ex- 
pect that, if one wished to maximize 
differences between high and low 
anxious Ss on intelligence tests, Ss 
would have to be run using highly 
Motivating, ego-involving conditions, 
so also physiological differences be- 
tween Ss differing in anxiety might 
occur only under stressful or motivat- 
ing conditions. 


MEASUREMENT OF ANXIETY AND ITS 
RELATION TO PERSONALITY AND TEST 
TAKING ATTITUDES 


In constructing tests of personality 
we must ask ourselves many ques- 
tions related to their reliability and 
validity. What does the test purport 
to measure? What is the best format 
for the test? How does the test re- 
late to other available instruments? 
What are the best ways in which to 
establish the validity of the test? 

Jessor and Hammond (1957) have 
pointed out in relation to anxiety 
scales that some of these questions 
can ultimately be answered through 
the process of construct validation. 
Unfortunately, at present, the con- 
struct validation of anxiety scales is 
at a rudimentary stage. For ex- 
ample, is a true-false paper and pencil 
test the most appropriate measure of 
anxiety? At present we do not know 
the answer to this question. Prob- 
ably the major reason for the wide 
use of paper and pencil indices of 
anxiety is convenience. While con- 
venience is a desirable characteristic, 
research is needed to investigate less 
convenient but perhaps more useful 
indices. 

Perhaps the most parsimonious 
statement that one can make con- 
cerning what is measured by existing 
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scales of anxiety is that they measure 
the extent to which an individual is 
willing to admit to experiencing anx- 
iety in certain situations. However, 
also to be considered are the following 
possibilities: (a) high anxiety scores 
may be obtained by certain Ss be- 
cause of plus-getting tendencies, i.e., 
tendencies to attribute “bad” char- 
acteristics to themselves; (b) high 
scores may be obtained by particu- 
larly frank and open Ss; (c) high 
scores may be obtained by Ss who 
are particularly perceptive of their 
own reactions. The converse of each 
of these possibilities represents a 
possible basis for low anxiety scores. 
In this connection, it should be 
pointed out that many true-false 
scales of anxiety have been found to 
correlate very highly and negatively 
with measures of defensiveness, test- 
taking attitude, and the tendency to 
respond to personality test items ina 
socially desirable direction (Ed- 
wards, 1957; Fordyce, 1956). Such 
high correlations may indicate that 
anxiety scores are explainable in 
terms of test-taking attitude. Wheth- 
er or not this is true is a problem 
that construct validation studies 
should be designed to answer. 
Interestingly, it is possible to con- 
struct scales of anxiety which do not 
correlate very highly with measures 
of test-taking attitude. The writer 
(Sarason, 1959a) has obtained corre- 
lations between the Test Anxiety 
Scale and SD of —.49 for women and 
Fe al ata Abe several forced- 
sented ioa miare been pre 
odene R tA a very considerable 
tone big reduce the correla- 
measures of Siete e, ane 
Chae @ Guia fee 
Leama udnitzky, 1957; Heine- 
, 1953; Lykken, 1957; Silverman, 
1957). However, it is possible that 
forced-choice techniquesin the field of 
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personality measurement create as 
many problems as they solve (Guil- 
ford, 1959, pp. 188-189). More re- 
search designed to measure anxiety 
in a variety of ways and to better 
understand anxiety and test-taking 
attitude relationships seems indi- 
cated. 

Two additional areas which re- 
quire further study are the relation- 
ship of measures of anxiety to (a) 
other personality dimensions and (b) 
to the clinical conditions of patients. 
With respect to anxiety and other 
personality measures, it appears that 
at least one test, the Psychasthenia, 
Pt, scale of the MMPI, correlates as 
highly with the MAS as the MAS 
correlates with itself (Brackbill & 
Little, 1954; Deese, et al., 1953; Erik- 
sen & Davids, 1955). Although Pz- 
MAS item overlap is clearly a factor 
in these high correlations, this rela- 
tionship between MAS and Pt may 
suggest that high scorers on anxiety 
scales obtain such Scores because of 
ruminative, obsessive thinking about 
themselves. If scales of anxiety, or at 
least the MAS, are measuring a varia- 
ble related to obsessive-compulsive 
tendencies, then the positive correla- 
tions reported by some investigators 
(Davids, 1955b; Siegal, 1954) be- 


tween MAS and measures of authori- 
tarianism 


ved in neu- 
ro -compulsive personal- 
ities, 

As was mentioned earlier, the 
weight of the available evidence indi- 
cates that scales of anxiety are tap- 
ping tendencies towards neuroticism 
maladjustment, and self-dissatisfac. 
tion (Bendig, 1958; Cowen, et al., 
1957; Davids & Eriksen, 1955; Fied- 
ler, et al., 1958; Holtzman, et al., 
1952; Winne, 1951). There are indi- 
cations also that this heightened inse- 
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curity of high anxious individuals 
may result in a greater susceptibility 
to persuasion and opinion change, 
and to greater sensitivity to reinforce- 
ments provided by Æ to S in learning 
situations (Fine, 1957; Janis, 1955; 
Sarason, 1958b; Taffel, 1955). For 
example, in two similar verbal condi- 
tioning studies both Taffel (1955) 
and Sarason (1958b) found that high 
anxious neuropsychiatric patients 
changed their frequency of usage of a 
verbal response class reinforced by E 
more easily than did patients with 
lower anxiety scores, 

These sorts of relationships are 
consistent with the observations 
made in Psychotherapeutic contacts 
with patients that liklihood of move- 
ment in therapy is, to a considerable 
extent, positively related to the pa- 
tient’s anxiety level. However, ter 
sults of studies on the diagnostic 
value of indices of anxiety do not as 
yet fall into clearly discernible pat- 
terns, and it is hard to draw generali- 
zations concerning the value of these 
indices as diagnostic tools. It can be 
said that a number of investigators 
have found anxiety scales to be cor- 
related either with indices of general 
maladjustment or ratings of anxiety 
made by clinicians (Buss, Weiner, 
Durkee, & Baer, 1955; Holtzman & 
Bitterman, 1956; Lauterbach, 1958; 
Matarazzo, Guze, & Matarazzo, 1955; 
Taft, 1957). The magnitude of these 
correlations, while significant, has 
often been so low as to preclude use 
in the individual case. Kendall 
(1954) has suggested that MAS be re- 
garded as only a rough clinical tool. 

Animportant methodological prob- 
lem in relating anxiety indices to the 
ratings of patients’ anxiety made by 
clinicians is the method of obtaining 
such judgments, Poorly constructed 
pating scales will inevitably lead to 
low-order relationships with other 
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measures. Jn this regard, attention 
should be called to the interesting 
study by Buss et al. (1955) in which 
the use of adequate procedures to in- 
Sure interrater reliability among clini- 
cians no doubt contributed to the 
Positive results obtained. 


SUMMARY 


‘ This paper has dealt with the rela- 
tionship of anxiety to certain re- 
Search areas. Existing research sug- 
gests the following summaries: 

1. The performance of high anxi- 
ous Ss is detrimentally affected by 
verbally administered highly moti- 
vating communications. This result 
is consistent with the view that high 
anxious Ss emit personalized, self- 
Oriented interfering responses when 
threat is perceived in the environ- 
ment. Under nonthreat conditions 
the emission of such responses would 
not be expected. It was pointed out 
that several methodological prob- 
lems remain to be solved in the as- 
Sessment of the relationship between 
anxiety and stress. On the Æ side 
there is the confounding of variables 
such as experimental instructions 
with characteristics of the E admin- 
istering such instructions. On the S 
side, more must be learned about the 
relationship of sex and personality 
characteristics of Ss which affect their 
responses to conditions of implied 
threat. 

2. The results of several experi- 
ments using MAS as a measure of 
drive have indicated that, as task 
complexity increases, the disadvant- 
age of high to low anxious Ss appears 
to increase. However, there has been 
considerable research in which this 
relationship was not confirmed. Per- 
haps the major theoretical problem 
in the anxiety-task complexity rela- 
tionship is the interpretation to be 
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placed on the complexity variable. 
Complex tasks can be both difficult 
and emotionally arousing. It would 
appear that both of these aspects of 
task complexity must be considered. 

3. Although several reports of cor- 
relations between measures of general 
anxiety, such as the MAS, and intel- 
lectual measures are to be found in 
the literature, it does not appear that 
this relationship consistently holds. 
Specific test anxiety, on the other 
hand, does seem to relate negatively 
to intellectual measures. It has been 
suggested that indices of specific anx- 
ieties such as test anxiety may prove 
more valuable for specific purposes 
than more general indices like MAS. 

4. Negative findings seem to per- 
vade the study of the relationship of 
anxiety to physiological indices. The 
typical procedure has been to select 
Ss differing in anxiety scores and to 
compare these Ss on autonomic meas- 
ures such as GSR. It was suggested 
that the lack of significant relation- 
ships in such comparisons may be at- 
tributable to a failure to make the 
comparisons under conditions of per- 
ceived threat or stress. High and low 
anxious Ss may differ in physiological 
response under threat but not under 
nonthreat conditions. 

5. Problems of the effects of test- 
taking attitudes on anxiety scores 
and the format of anxiety scales have 
as yet not been given the intensive 
study which they merit. While most 
indices of anxiety of the MAS type 
have been found to correlate nega- 
tively and very highly with me 
of test-taking attitudes 
sa 6i tie MMPI), this has not 

ees obtained in all cases. Forced- 

oy Satan athe Test An 
with test-taki not correlate as highly 
Sid other ng attitude as do MAS 
Pak general anxiety indices. 
er construct validation of both 


asures 
(e.g., the K 
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anxiety and test-taking attitude 
scales may illuminate the significance 
of these findings. 

The aim of this paper has been to 
point to some of the consistencies and 
inconsistencies in the area of anxiety 
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research and to suggest some of the 
uncontrolled and confounding vari- 
ables which may have led to dis- 
crepant findings and which need to 
be systematically studied in future 
research. 
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The theory of probability and sta- 
tistical inference is various things to 
various people. To the mathemati- 
cian, it is an intricate formal calculus, 
to be explored and developed with 
little professional concern for any 
empirical significance that might at- 
tach to the terms and propositions 
involved. To the philosopher, it is an 
embarrassing mystery whose justifica- 
tion and conceptual clarification have 
remained stubbornly refractory to 
philosophical insight. (A famous 
philosophical epigram has it that in- 
duction [a special case of statistical 
inference] is the glory of science and 
the scandal of philosophy.) To the 
experimental scientist, however, sta- 
tistical inference is a research instru- 
ment, a processing device by which 
unwieldy masses of raw data may be 
refined into a product more suitable 
for assimilation into the corpus of sci- 
ence, and in this lies both strength 
and weakness. It is strength in that, 
as an ultimate consumer of statistical 
methods, the experimentalist is in 
position to demand that the tech- 
niques made available to him con- 
form to his actual needs. 
also weakness 


the tools constructed by a highly 


h { the ex- 
perimentalist, who has specialized 


along other lines, seldom feels compe- 
tent to extend criticisms or even com- 
ments; he is much more likely to 
make unquestioning application of 
procedures learned more or less by 
rote from persons assumed to be more 
knowledgeable of statistics than he. 
There is, of course, nothing surprising 
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or reprehensible about this—one 
need not understand the principles of 
a complicated tool in order to make 
effective use of it, and the research 
scientist can no more be expected to 
have sophistication in the theory of 
statistical inference than he can be 
held responsible for the principles of 
the computers, signal generators, 
timers, and other complex modern 
instruments to which he may have re- 
course during an experiment. None- 
theless, this leaves him particularly 
vulnerable to misinterpretation of 
his aims by those who build his in- 
struments, not to mention the ever 
Present dangers of selecting an inap- 
Propriate or outmoded tool for the 
job at hand, misusing the proper tool, 
or improvising a tool of unknown 
adequacy to meet a problem not con- 
orming to the simple theoretical situ- 
ations in terms of which existent in- 
struments have been analyzed. Fur- 
ther, since behaviors once exercised 
tend to crystallize into habits and 
eventually traditions, it should come 
as no surprise to find that the tribal 
rituals for data-processing passed 
along in graduate courses in experi- 
mental method should contain ele- 
ments justified more by custom than 
y reason. 
In this paper, I wish to examine a 
dogma of inferential procedure which, 
for Psychologists at least, has at- 
tained the status of a religious con- 
viction. The dogma to be scrutinized 
is the “null-hypothesis significance 
test” orthodoxy that passing statisti- 
cal judgment on a scientific hypothe- 
sis by means of experimental observa- 
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tion is a decision procedure wherein 
one rejects or accepts a null hypothe- 
sis according to whether or not the 
value of a sample statistic yielded by 
an experiment falls within a certain 
predetermined “rejection region” of 
its possible values. The thesis to be 
advanced is that despite the awe- 
some pre-eminence this method has 
attained in our experimental jour- 
nals and textbooks of applied sta- 
tistics, it is based upon a funda- 
mental misunderstanding of the na- 
ture of rational inference, and is sel- 
dom if ever appropriate to the aims 
of scientific research. This is not a 
Particularly original view— tradi- 
tional null-hypothesis procedure has 
already been superceded in modern 
statistical theory by a variety of 
more satisfactory inferential tech- 
niques. But the perceptual defenses 
of psychologists are particularly effi- 
cient when dealing with matters of 
methodology, and so the statistical 
folkways of a more primitive past 
continue to dominate the local scene. 
_ To examine the method in question 
in greater detail, and expose some of 
the discomfitures to which it gives 
rise, let us begin with a hypothetical 
case study. 


A Case STUDY In NULL-HYPOTHESIS 
PROCEDURE; OR, A QUORUM OF 
EMBARRASSMENTS 


Suppose that according to the the- 
ory of behavior, Te, held by most 
right-minded, respectable behavior- 
ists, the extent to which a certain be- 
havioral manipulation M facilitates 
learning in a certain complex learn- 
ing situation C should be null. That 
is, if “o” designates the degree to 
which manipulation M facilitates the 
acquisition of habit H under cir- 
cumstances C, it follows from the 
orthodox theory Teo that 6=0. Also 
suppose, however, that a few radicals 


have persistently advocated an al- 
ternative theory Ti which entails, 
among other things, that the facilita- 
tion of H by M in circumstances c 
should be appreciably greater than 
zero, the precise extent being de- 
pendent upon the values of certain 
parameters in C. Finally, suppose 
that Igor Hopewell, graduate student 
in psychology, has staked his disser- 
tation hopes on an experimental test 
of To against Tı on the basis of their 
differential predictions about the 
value of ¢. 

Now, if Hopewell is to carry out his 
assessment of the comparative mer- 
its of To and T; in this way, there is 
nothing for him to do but submit a 
number of Ss to manipulation M 
under circumstances C and compare 
their efficiency at acquiring habit H 
with that of comparable Ss who, 
under circumstances C, have not been 
exposed to manipulation M. The 
difference, d, between experimental 
and control Ss in average learning 
efficiency may then be taken as an 
operational measure of the degree, ¢, 
to which M influences acquisition of 
H in circumstances C. Unfortu- 
nately, however, as any experienced 
researcher knows to his sorrow, the 
interpretation of such an observed 
statistic is not quite so simple as that. 
For the observed dependent variable 
d, which is actually a performance 
measure, is a function not only of the 
extent to which M influences acquisi 
tion of H, but of many additional 
major and minor factors as well 
Some of these, such oe Nee 

: » such as deprivations 
species, age, laboratory conditi 4 
etc., can be removed from era, 
tion by holding them consiuere: 

i essentially con- 
stant. Others, however 
easily controlled, es seed ap ohio 
. , pecially those 
customarily subsumed der tt 
headings of “individual ane ey ee 
and “errore of ual differences 
measurement. To 
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curtail a long mathematical story, it 
turns out that with suitable (possibly 
justified) assumptions about the dis- 
tributions of values for these uncon- 
trolled variables, the manner in 
which they influence the dependent 
variable, and the way in which ex- 
perimental and control Ss were se- 
lected and manipulated, the ob- 
served sample statistic d may be re- 
garded as the value of a normally dis- 
tributed random variate whose aver- 
age value is ¢ and whose variance, 
which is independent of $, is unbi- 
asedly estimated by the square of 
another sample Statistic, s, computed 
from the data of the experiment! 
The import of these statistical con- 
siderations for Hopewell’s disserta- 
tion, of course, is that he will not be 
permitted to reason in any simple 
way from the observed d to a conclu- 
sion about the comparative merits of 
of To and Ty. To conclude that To, 
rather than T, is correct, he must 
argue that ¢=0, rather than $>0, 
But the observed d, whatever its 
value, is logically compatible both 
with the hypothesis that ¢=0 and 
the hypothesis that $>0. How then, 
can Hopewell use his data to make a 


statistic, con- 
t distributions 
the appropriate degrees-of- 
announce his experi- 
ment as disconfirming or supporting 
To, respectively, according to whether 
or not the discrepancy between d and 
the zero value expected under 7, 
is “statistically Significant”—j.¢,, 
whether or not the observed value of 
d/s falls outside of the interval be- 
tween two extreme Percentiles (usy- 

1 s is here the estimate 


of the difference in mean 
the individual SD. 


of the Standard error 
S, not the estimate of 


ally the 2.5th and 97.5th) of the ¢ 
distribution with that df. If asked 
by his dissertation committee to jus- 
tify this behavior, Hopewell would 
rationalize something like the follow- 
ing (the more honest reply, that this 
is what he has been taught to do, 
not being considered appropriate to 
such occasions) : 


In deciding whether or not To is correct, I 
can make two types of mistakes: I can reject 
To when it is in fact correct [Type I error], or 
I can accept To when in fact it is false [Type 
II error]. Asa scientist, I have a professional 
obligation to be cautious, but a 5% chance of 
error is not unduly risky. Now if all my 
statistical background assumptions are cor- 
rect, then, if it is really true that ¢=0 as To 
says, there is only one chance in 20 that my 
observed statistic d/s will be smaller than 
‘ors or larger than £976, where by the latter I 
mean, respectively, the 2.5th and 97.5th per- 
centiles of the t distribution with the same 
degrees-of-freedom as in my experiment. 

erefore, if I reject To when d/s is smaller 
than £025 or larger than tvs, and accept To 


otherwise, there is only a 5% chance that I 
Will reject To incorrectly, 


If asked about his Type II error, and 
why he did not choose some other re- 
Jection region, say between ta and 
£525, which would yield the same prob- 
ability of Type I error, Hopewell 
should reply that although he has no 
Way to compute his Probability of 
Type II error under the assumptions 
traditionally authorized by null-hy- 
Pothesis procedure, it is presumably 
minimized by taking the rejection re- 


gion at the extremes of the ¢ distribu- 
tion. 


Let us Suppose that for Hopewell’s 
data, d=8.50, s=5.00, and df=20. 
Then ¢.975=2.09 and the acceptance 
region for the null hypothesis ġ =0 is 
=2.09 <d/s<2.09, ` or —10.45 <d. 
<10.45. Since d does fall within this 
region, standard null-hypothesis de- 
cision Procedure, which I shall hence- 
forth abbreviate “NHD,” dictates 
that the experiment is to be reported 


——— 
es 
op 1auusuluititits 
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P aupporting theory To. (Although 
NHT persons would like to conceive 
` D testing to authorize only re- 
‘ia of the hypothesis, not, in ad- 
ition, its acceptance when the test 

ae ite fails to fall in the rejection 
oa if failure to reject were not 
then NED reoeed for a ae 
k rocedure w nvolve 

7 Type II error, at pe 
YAR J be given for taking the rejec- 
diet region at the extremes of the 
dle ribution, rather than in its mid- 
ft But even as Hopewell reaffirms 

o in his dissertation, he begins to 
feel uneasy. In fact, several disquiet- 
ing thoughts occur to him: 

Er Although his test statistic falls 
thin the orthodox acceptance re- 
gion, a value this divergent from the 
expected zero should nonetheless be 
ag hae less than once in 10. To 
rgue in favor of a hypothesis on the 
asis of data ascribed a p value no 
ne than .10 (ie., 10%) by that 
hy pothesis certainly does not seem to 
be one of the more impressive dis- 
Plays of scientific caution. 

2. After some belated reflection 
on the details of theory 71, Hopewell 
observes that Tı not only predicts 
that @>0, but with a few simplifying 
assumptions no more questionable 
than is par for this sort of course, the 
value that ¢ should have can actu- 
ally be computed. Suppose the value 
derived from Tsin this way isọ = 10.0. 
Then, rather than taking ġ =0 as the 
null hypothesis, one might just as well 
take ġ= 10.0; for under the latter, 
(d— 10.0)/s isa 20 df t statistic, giving 
a two-tailed, 95% significance, accept- 
ebereeent ee a 10.0) fe between 
—,.209 and 2.09. hat is, if one lets 
T, provide the null hypothesis, it is 
accepted or rejected according to 
whether or not — 45 <d<20.45, and 
by this latter test, therefore, Hope- 
well’s data must be taken to support 


Ty—in fact, the likelihood under Ti 
of obtaining a test statistic this di- 
vergent from the expected 10.0 is a 
most satisfactory three chances in 
four. Thus it occurs to Hopewell 
that had he chosen to cast his pro- 
fessional lot with the Ti-ists by 
selecting @ = 10.0 as his null hypothe- 
sis, he could have made a strong 
argument in favor of Tı by precisely 
the same line of statistical reasoning 
he has used to support To under 
¢=0 as the null hypothesis. That is, 
he could have made an argument 
that persons partial to Tı would re- 
gard as strong. For behaviorists who 
are already convinced that To is cor- 
rect would howl that since To is the 
dominant theory, only ¢=0 is a 
legitimate null hypothesis. (And 
is it not strange that what constitutes 
a valid statistical argument should 
be dependent upon the majority 
opinion about behavior theory?) 

3. According to the NHD test of a 
hypothesis, only two possible final 
outcomes of the experiment are rec- 
ognized—either the hypothesis is 
rejected or it is accepted. In Hope- 
well’s experiment, all possible values 
of d/s between —2.09 and 2.09 have 
the same interpretive significance, 
namely, indicating that ¢=0, while 
conversely, all possible values of d/s 
greater than 2.09 are equally taken 
to signify that $0. But Hopewell 
finds this disturbing, for of the vari- 
ous possible values that d/s might 
have had, the significance of d/s = 1.70 
for the comparative merits of To and 
pens surely be more similar to 

of, say, d/s=2.10 than to that 
of, say, d/s= — 1.70. 
ts papi similar vein, it 
aa aus opewell that had he 
ea ievel — riskier confi- 
10% ee a Typed esi) 
f le é ‘than 5%, d/s would have 
allen outside the region of accept- 


420 


ance and To would have been re- 
jected. Now surely the degree to 
which a datum corroborates or im- 
pugns a proposition should be inde- 
pendent of the datum-assessor’s per- 
sonal temerity. Yet according to 
orthodox significance-test procedure, 
whether or not a given experimental 
outcome supports or disconfirms the 
hypothesis in question depends cru- 
cially upon the assessor’s tolerance 
for Type I risk. 

Despite hisinexperience, Igor Hope- 
well is a sound experimentalist at 
heart, and the more he reflects on 
these statistics, the more dissatisfied 
with his conclusions he becomes. So 
while the exigencies of graduate cir- 
cumstances and publication require- 
ments urge that his dissertation be 
written as a confirmation of To, he 
nonetheless resolves to keep an open 
mind on the issue, even carrying out 
further research if Opportunity per- 
mits. And reading his experimental 
Teport, so of course would we—has 
any responsible scientist ever made 
up his mind about such a matter on 
the basis of a single experiment? Yet 
in this obvious way we reveal how 
little our actual inferential behavior 
corresponds to the statistical 
dure to which we pay lip-service, 
For if we did 
the null hypothesis according to 
whether the 
the acceptan 


region, then there would be no repli- 


the hypothesis in question. And the 
fact that in actual Practice, a Single 
finding seldom even tempts us to 
such closure of judgment reveals 
how little the conventional model of 
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hypothesis testing fits our actual 
evaluative behavior. 


Decisions vs. DEGREES or BELIEF 


By now, it should be obvious that 
something is radically amiss with the 
traditional NHD assessment of an 
experiment’s theoretical import. Ac- 
tually, one does not have to look far 
in order to find the trouble—it is sim- 
ply a basic misconception about the 
purpose of a scientific experiment. 
The null-hypothesis significance test 
treats acceptance or rejection of a 
hypothesis as though these were 
decisions one makes on the basis of the 
experimental data—i.e., that we elect 
to adopt one belief, rather than an- 
other, as a result of an experimental 
outcome. But the primary aim ofa 
scientific experiment is not to precipi- 
tate decisions, but to make an appropri- 
ale adjustment in the degree to which 
one accepts, or believes, the hypothesis 
or hypotheses being tested. And even 
if the purpose of the experiment were 
to reach a decision, it could not be a 
decision to acce 
Pothesis, for de 


commitments are 


S acceptance or 
pothesis is a cognitive 
ay provide the basis 


follows: As scientists, it is our profes- 
sional obligation to reason from avail- 
explanations and gen- 
eralities—i.e,, beliefs—which are sup- 
ported by these data. But belief in 
GE, acceptance of) a proposition is 
not an all-or-none affair; rather, it is 
a matter of degree, and the extent to 
which a person believes or accepts a 


~- 
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Proposition translates pragmatically 
i the extent to which he is willing 
commit himself to the behavioral 
sumens prescribed for him by 
i: meaning of that proposition. For 
xample, if that inveterate gambler, 


nfortunate Q. Smith, has complete’ 


oe that War Biscuit will win 
willin th race at Belmont, he will be 

ing to accept any odds to place a 
i re War Biscuit to win; for if he is 
nae certain that War Biscuit 
ie. wf at then odds are irrelevant—It 
coll mply a matter of arranging to 

ect some winnings after the race. 

n the other hand, the more that 

mith has doubts about War Biscuit's 
prospects; the higher the odds he will 
Fma before betting. That is, the 
extent to which Smith accepts or re- 
Sii the hypothesis that War Biscu t 
; ill win the fifth at Belmont is an 
Mportant determinant of his betting 
decisions for that race. 

Now, although a scientist’s data 
Supply evidence for the conclusions 
le draws from them, only in the un- 
likely case where the conclusions are 
logically deducible from or logically 
incompatible with the data do the 
data warrant that the conclusions be 
entirely accepted or rejected. Thus, 
eg., the fact that War Biscuit has 
won all 16 of his previous starts is 
Strong evidence in favor of his win- 
ning the fifth at Belmont, but by no 
means warrants the unreserved ac- 
ceptance of this hypothesis. More 
generally, the data available confer 
upon the conclusions a certain ap- 
propriate degree of belief, and it is the 
inferential task of the scientist to pass 
from the data of his experiment to 
whatever extent of belief these and 
other available information justify 
in the hypothesis under investigation. 
In particular, the proper inferential 
procedure is not (except in the deduc- 


tive case) a matter of deciding to 
accept (without qualification) or re- 
ject (without qualification) the hy- 
pothesis: even if adoption of a belief 
were a matter of voluntary action— 
which it is not—neither such ex- 
tremes of belief or disbelief are ap- 
propriate to the data at hand. Asan 
example of the disastrous conse- 
quences of an inferential procedure 
which yields only two judgment 
values, acceptance and rejection, 
consider how sad the plight of Smith 
would be if, whenever weighing the 
prospects for a given race, he always 
worked himself into either supreme 
confidence or utter disbelief that a 
certain horse will win. Smith would 
rapidly impoverish himself by ac- 
cepting excessively low odds on 
horses he is certain will win, and fail- 
ing to accept highly favorable odds on 
horses he is sure will lose. In fact, 
Smith’s two judgment values need 
not be extreme acceptance and rejec- 
tion in order for his inferential proce- 
dure to be maladaptive. All that is 
required is that the degree of belief 
arrived at be in general inappropriate 
to the likelihood conferred on the 
hypothesis by the data. 

Now, the notion of “degree of be- 
lief appropriate to the data at hand” 
has an unpleasantly vague, subjec- 
tive feel about it which makes it 
unpalatable for inclusion in a formal- 
ized theory of inference. Fortunately 
a little reflection about this phrase 
reveals it to be intimately connected 
with another concept relating con- 
clusion to evidence which, though 
likewise in serious need of conceptual 
clarification, has the virtues both of 
intellectual respectability and statis- 
tical familiarity. I refer, of course, to 
the likelihood, or probability, con- 
ferred upon a hypothesis by available 
evidence. Why should not Smith fee! 
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certain, in view of the data available, 
that War Biscuit will win the fifth at 
Belmont? Because it is not certain 
that War Biscuit will win. More 
generally, what determines how strong- 
ly we should accept or reject a propo- 
sition is the probability given to this 
hypothesis by the information at 
hand. For while our voluntary ac- 
tions (i.e., decisions) are determined 
by our intensities of belief in the rele- 
vant propositions, not by their actual 
probabilities, expected utility is max- 
imized when the cognitive weights 
given to potential but not yet known- 


for-certain pay-off events are repre- , 


sented in the decision procedure by 
the probabilities of these events. We 
may thus relinquish the concept of 
“appropriate degree of belief” in 
favor of “Probability of the hypoth- 
esis,” and our earlier contention 
about the nature of data-processing 
may be rephrased to say that the 
proper inferential task of the experi- 
mental scientist is not a simple ac- 
ceptance or rejection of the tested 
hypothesis, but determination of the 
probability conferred upon it by the 
experimental outcome., This likeli- 
hood of the hypothesis relative to 
whatever data are available at the 
moment will be an important deter- 
minant for decisions which must cur- 


a ¢ his hypotheses, 
and he is interested in the Probability 


ascribed by a hypothesis to an ob- 


served experimental outcome only to 
the extent he is able to reason back- 
wards to the likelihood of the hy- 
pothesis, given this outcome. Put 
crudely, no matter how improbable 
an observation may be under the 
hypothesis (and when there are an 
infinite number of possible outcomes, 
the probability of any particular one 
of these is, usually, infinitely small— 
the familiar value for an observed 
statistic under a hypothesis H is not 
actually the Probability of that out- 
come under H, but a partial integral 
of the Probability-density function of 
Possible outcomes under H), it is still 
confirmatory (or at least nondiscon- 
firmatory, if one argues from the data 
to rejection of the background as- 
sumptions) so long as the likelihood 
of the observation is even smaller 
under the alternative hypotheses. To 
be sure, the theory of hypothesis- 
likelihood and inverse probability is 
as yet far from the level of develop- 
ment at which it can furnish the re- 
Search scientist with inferential tools 

e can apply mechanically to obtain 
a definite likelihood estimate. But to 
the extent a Statistical method does 


hypothesis, 
that method is not truly 
inference, and is unsuite 
entist’s cognitive ends, 


observation, 
a method of 
d for the sci- 


THE METHODOLOGICAL STATUS oF 
THE Nutu-Hypornests SIGNIFI- 
CANCE TEST 


The Preceding arguments have, in 
one form or another, raised several 
doubts about the appropriateness of 
conventional - significance-test deci- 
sion procedure for the aims it is sup- 
Posed to achieve, It is now time to 
bring these charges together in an 
explicit bill of indictment, 


1. The null-hypothesis significance 


=a 
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mere d a hypothesis as though these 
De one makes. But a 
Bice nt ae is not something, like a 
oe be pie offered for dessert, which 
volunta. accepted or rejected by a 
e or ry physical action. Accept- 
cogniti rejection of a hypothesis 1s 4 
ing or es Toa a degree of believ- 
RA dis elieving which, if rational, 
Mired a mertr of choice but deter- 
4 ls acd by how likely it is, given 
frue, ence, that the hypothesis 1s 
2 ‘ 
NUD ‘t might be argued that the 
aided est may nonetheless be re- 
cedure as a legitimate decision pro- 
řejecti if we translate “acceptance 
Pode! of the hypothesis as 
Poth ing “acting as though the hy- 
3 wing were. true (false).” And to 
S there are many occasions on 
tion one must base a course of ac- 
3 on the credibility of a scientifc 
Ypothesis. (Should these data be 
Sa Sa Should I devote my re- 
HG rch resources to and become iden- 
C ed professionally with this theory? 
an we test this new Z bomb without 
exterminating all life on earth?) But 
such a move to salvage the tradi- 
tional procedure only raises two fur- 
ther objections. (a) While the scien- 
tist—i.e., the person—must indeed 
make decisions, his science is a sys- 
tematized body of (probable) knowl- 


` edge, not an accumulation of deci- 


sions. The end product of a scientific 
investigation is a degree of confidence 
in some set of propositions, which 
then constitutes a basis for decisions. 
(b) Decision theory shows the NHD 
test to be woefully inadequate as a 
decision procedure. In order to de- 
cide most effectively when or when 
not to act as though a hypothesis is 
correct, one must know both the 
probability of the hypothesis under 
the data available and the utilities of 
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the various decision outcomes Ges 
the values of accepting the hypothe- 
sis when it ‘is true, of accepting it 
when it is false, of rejecting it when 
it is true, and of rejecting it when it is 
false). But traditional NHD proce- 
dure pays no attention to utilities at 
all, and considers the probability of 
the hypothesis, given the data—i.e., 
the inverse probability—only in the 
most rudimentary way (by taking 
the rejection region at the extremes 
of the distribution rather than in its 
middle). Failure of the traditional 
significance test to deal with inverse 
probabilities invalidates it not only 
as a method of rational inference, but 
also as a useful decision procedure. 
3. The traditional NHD test un- 
realistically limits the significance of 
an experimental outcome to a mere 
two alternatives, confirmation or 
disconfirmation of the null hypothe- 
sis. Moreover, the transition from 
confirmation to disconfirmation as a 
function of the data is discontinuous 
—an arbitrarily small difference in 
the value of the test statistic can 
change its significance from con- 
firmatory to disconfirmatory. Fi- 
nally, the point at which this transi- 
tion occurs is entirely gratuitous. 
There is absolutely no reason (at 
least provided by the method) why 
the point of statistical ‘‘significance”’ 
should be set at the 95% level, rather 
than, say the 94% or 96% level. Nor 
does the fact that we sometimes select 
a 99% level of significance, rather 
Ge tle bse 95% level, mitigate 
ection—one 1s i 

the other. asarbitrary a 
te nl. typos sien 
eee ae dae bias in favor 
number of on - bie large 
eean — e alternatives. 
ea ga istribution of un- 
mean y, different assumptions 

about the value of u furnish an infi- 
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nite number of alternate null hy- 
potheses by which we might assess 
the sample mean, and whichever hy- 
pothesis is selected is thereby given 
an enormous, in some cases almost 
insurmountable, advantage over its 
competitors. That is, NHD proce- 
dure involves an inferential double 
standard—the favored hypothesis is 
held innocent unless proved guilty, 
while any alternative is held guilty 
until no choice remains but to judge 
it innocent. What is objectionable 
here is not that some hypotheses are 
held more resistant to experimental 
extinction than others, but that the 
differential weighing is an all-or-none 
side effect of a personal choice, and 
especially, that the method necessi- 
tates one hypothesis being favored 
over all the others. In the classical 
theory of inverse probability, on the 
other hand, all hypotheses are treated 
on a par, each receiving a weight (i.e., 
its “a priori” probability) which re- 
flects the credibility of that hypothe- 
sis on grounds other than the data 
being assessed. 

5. Finally, if anything can reveal 
the practical irrelevance of the con- 
ventional significance test, it should 
be its failure to see genuine applica- 
tion to the inferential behavior of the 
research scientist. Who has ever 
given up a hypothesis just because 
one experiment yielded a test statis- 
tic in the rejection region? And what 
scientist in his right mind would ever 
feel there to be an appreciable differ- 
ence between the interpretive signifi- 
cance of data, say, for which one- 
tailed p=.04 and that of data for 
which p=.06, even though the point 
of “significance” has been set at 
p=.05? In fact, the reader may well 
feel undisturbed by the charges 
raised here against traditional NHD 
procedure Precisely because, without 
perhaps realizing it, he has never 
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taken the method seriously anyway- 
Paradoxically, it is often the most 
firmly institutionalized tenet of faith 
that is most susceptible to untroubled 
disregard—in our culture, one must 
early learn to live with sacrosanct 
verbal formulas whose import for 
practical behavior is seldom heeded. 
I suspect that the primary reasons 
why null-hypothesis significance test- 
ing has attained its current ritualistic 
status are (a) the surcease of meth- 
odological insecurity afforded by 
having an inferential algorithm on 
the books, and (b) the fact that a by- 
product of the algorithm is so useful, 
and its end product so obviously 
inappropriate, that the latter can be 
ignored without even noticing that 
this has, in fact, been done. What 
has given the traditional method its 
spurious feel of usefulness is that the 
first, and by far most laborious, step 
in the Procedure, namely, saree 
the probability of the experimenta 
outcome under the assumption that @ 
certain hypothesis is correct, is also & 
crucial first step toward what one 15 
genuinely concerned with, namely, 
an idea of the likelihood of that hy- 
pothesis, given this experimental out- 
come. Having obtained this most 
valuable statistical information under 
pretext of carrying through a con- 
ventional significance test, it is then 
tempting, though of course quite 
inappropriate, to heap honor and 
gratitude upon the method while 
overiooking that its actual results 
namely, a decision to accept or re- 
ject, is not used at all. 


TOWARD A MORE REALISTIC AP- 
PRAISAL OF EXPERIMENTAL DATA 


So far, my arguments have tended 
to be aggressively critical—one can 
hardly avoid polemics when butcher- 
ing sacred cows. But my purpose !$ 
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not just to be contentious, but to help 
clear the way for more realistic tech- 
niques of data assessment, and the 
time has now arrived for some con- 
structive suggestions. Little of what 
follows pretends to any originality; 
I merely urge that ongoing develop- 
ments along these lines should receive 
Maximal encouragement. 

For the statistical theoretician, 
the following problems would seem to 
be eminently worthy of research: 

1. Of supreme importance for the 
theory of probability is analysis of 
What we mean by a proposition’s 

Probability,” relative to the evi- 
dence provided. Most serious’ stu- 
dents of the philosophical foundations 
of probability and statistics agree 
(cf. Braithwaite, pp. 119f.) that the 
Probability of a proposition (e.g., 
the probability that the General 

heory of Relativity is correct) does 
hot, prima facie, seem to be the same 
Sort of thing as the probabiuty of an 
event-class (e.g., the probability of 
Setting a head when this coin is 
tossed). Do the statistical concepts 
and formulas which have been de- 
Veloped for probabilities of the latter 
kind also apply to hypothesis likeli- 
hoods? In particular, are the proba- 
bilities of hypotheses quantifiable at 
all, and for the theory of inverse 
Probability, do Bayes’ theorem and 
its probability-density refinements 
apply to hypothesis probabilities? 
These and similar questions are 
urgently in need of. clarification. 

2. If we are willing to assume that 
Bayes’ theorem, or something like it, 
holds for hypothesis probabilities, there 
is much that can be done to develop 
the classical theory of inverse proba- 
bility. While computation of inverse 
probabilities turns essentially upon 
the parametric a priori probability 
function, which states the probability 
of each alternative hypothesis in the 
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set under consideration prior to the 
outcome of the experiment, it should 
be possible to develop theorems which 
are invariant over important sub- 
classes of a priori probability func- 
tions. In particular, the difference 
between the a priori probability 
function and the ‘‘a posteriori” prob- 
ability function (i.e., the probabil- 
ities of the alternative hypotheses 
after the experiment), perhaps ana- 
lyzed as a difference in “‘informaticn,” 
should be a potentially fruitful source 
of concepts with which to explore 
such matters as the ‘‘power” or 
“efficiency” of various statistics, the 
acquisition of inductive knowledge 
through repeated experimentation, 
etc. Another problem which seems 
to me to have considerable import, 
though not one about which I am 
sanguine, is whether inverse-proba- 
bility theory can significantly be 
extended to hypothesis-probabilities, 
given knowledge which is only proba- 
bilistic. That is, can a theory of 
sentences of form “The probability 
of hypothesis H, given that E is the 
case, is p,” be generalized to a theory 
of sentences of form “The probability 
of hypothesis H, given that the prob- 
ability of Eis q, is £”? Such a theory 
would seem to be necessary, e.g., if 
we are to cope adequately with the 
uncertainty attached to the back- 
ground assumptions which always 
accompany a statistical analysis, 

_ My suggestions for applied statis- 
tical analysis turn on the fact that 
while what is desired is the a poste- 
riori probabilities of the various 
cesar mene sac 
classical nee oes of these by 
respondin ces tire Cor 
e ee probability distri- 

, and in the more i i 
lature. ab Teast. ` re immediate 

this will exist 5 pepe a 
feel, differi only as a subjective 

0 ering from one person to the 
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next, about the credibilities of the 
various hypotheses. : 

3. Whenever possible, the basic 
statistical report should be in the 
form of a confidence interval. Briefly, 
a confidence interval is a subset of 
the alternative hypotheses computed 
from the experimental data in such a 
way that for a selected confidence 
level a, the probability that the true 
hypothesis is included in a set so 
obtained is a. Typically, an @-level 
confidence interval consists of those 
hypotheses under which the p value 
for the experimental outcome is 
larger than 1—@ (a feature of con- 
fidence intervals which is sometimes 
confused with their definition), in 
which case the confidence-interval 
report is similar to a simultaneous 
null-hypothesis significance test of 
each hypothesis in the total set of 
alternatives. Confidence intervals 
are the closest we can at present 
come to quantitative assessment of 
hypothesis-probabilities (see technical 
note, below), and are currently our 
most effective way to eliminate hy- 
potheses from practical considera- 
tion—if we choose to act as though 
none of the hypotheses not included 
in a 95% confidence interval are 
correct, we stand only a 5% chance 
of error. (Note, moreover, that this 
probability of error pertains to the 
incorrect simultaneous “rejection” 
of a major part of the total set of 
alternative hypotheses, not just to 
the incorrect rejection of one as in 
the NHD method, and is a total like- 
lihood of error, not just of Type I 
error.) The confidence interval is also 
a simple and effective way to convey 
that all-important statistical datum, 
the conditional Probability (or proba- 
bility density) function—i.e., the 
probability (probability density) of 
the observed outcome under each 
alternative hypothesis—since for a 


WILLIAM W. ROZEBOOM 


given kind of observed statistic and 
method of confidence-interval deter- 
mination, there will be a fixed rela- 
tion between the parameters of the 
confidence interval and those of the 
conditional probability (probability 
density) function, with the ane 
points of the confidence ata 
typically marking the points at which 
the conditional probability (proba- 
bility density) function sinks below @ 
certain small value related to the 
parameter a. The confidence-interv@ 
report is not biased toward some 
favored hypothesis, as is the null- 
hypothesis significance test, but makes 
an impartial simultaneous evaluation 
of all the alternatives under consider- 
ation. Nor does the confidence inter- 
val involve an arbitrary decision a$ 
does the NHD test. Although gne 
person may prefer to report, say: 
95% confidence intervals while an- 
other favors 99% confidence inter- 
vals, there is no conflict here, for 
these are simply two ways to convey 
the same information. An exper 
mental report can, with complete 
consistency and some benefit, simul- 
taneously present several confidence 
intervals for the parameter being 
estimated. On the other hand, differ- 
ent choices of significance level in the 
NHD method is a clash of incompat!- 
ble decisions, as attested by the fact 
that an NHD analysis which simul- 
taneously presented two different 
significance levels would yield a log" 
cally inconsistent conclusion oa 
the observed statistic has a value a 
the acceptance region of one sign! A 
cance level and in the rejection reg!° 

of the other. 


x 5 : ur 
rived from confidence-interval theory, "he 
cial-probability theory (a special case ficient 
former in which the estimator is a su 
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neue and classical (i.e., Bayes’) inverse- 
of a sed theory. While the interpretation 
pr ai Sag oe is tricky, it would bea 
marks usu ie ude): ‘as the ‘cautionary se. 
confiden tally accompanying discussions of 
Gat it ce intervals sometimes seem to imply, 
dered a confidence-level æ of a given confi- 
ees ee al Z should not really be construed 
lots = bility that the true hypothesis, H, 
wach E Te set I. Nonetheless, if Z is an 
that f ron idence interval, the probability 
theorain sangs to I as computed by Bayes 
ition ear en an a priori probability distri- 
is the ‘an , in general, not be equal to a, nor 
ay a erence necessarily a small one—t 15 
Posterio, _ construct examples where the a 
either ae probability that H belongs to Iis 
sides ad f; Obviously, when different tech- 
elongs P computing the probability that H 
reconcili o I yield such different answers, @ 
en lation is demanded. In this instance, 
if nae a! the apparent disagreement is largely 
enc entirely spurious, resulting from dif- 
ER in the evidence relative to which the 

nd af lity that H belongs to T is computed. 
then fa is is, in fact, the correct explanation, 
Pr id ucial probability furnishes a partial 

eee to an outstanding difficulty in the 
ep aro A major weakness of the 
were, as always been the problem of what to 
breve 1e for the a priori distribution when no 
than ment information isavailable other 
Gan that supporting the background assump- 
dek s which delimit the set of hypotheses un- 
aoa S deration, The traditional assump- 
tki (made hesitantly by Bayes, less hesi- 
ciel y by his successors) has been the “prin- 
ay le of insufficient reason,” namely, that 
Riven no knowledge at all, all alternatives are 
equally likely. But not only is it difficult to 
ive a convincing argument for this assump- 
tion, it does not even yield a unique a priori 
Probability distribution over a continuum of 
alternative hypotheses, since there are many 
Ways to express such a continuous set, and 
what is an equilikelihood a priori distribu- 
tion under one of these does not necessarily 
transform into the same under another. Now, 
a fiducial probability distribution determined 
over a set of alternative hypotheses by an ex- 
perimental observation is a measure of the 
likelihoods of these hypotheses relative to all 
the information contained in the experimental 
data, but based on no pre-experimental in- 
formation beyond the background assump- 
tions restricting the possibilities to this par- 
ticular set of hypotheses. Therefore, it seems 
reasonable to postulate that the no-knowledge 
a priori distribution in classical inverse prob- 
bility theory should be that distribution 
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which, when experimental data capable of 
yielding a fiducial argument are now given, 
results in an a posteriori distribution identical 
with the corresponding fiducial distribution. 


4. While a confidence-interval anal- 
ysis treats all the alternative hypoth- 
eses with glacial impartiality, it 
nonetheless frequently occurs that 
our interest is focused on a certain 
selection from the set of possibilities. 
In such case, the statistical analysis 
should also report, when computable, 
the precise p value of the experimen- 
tal outcome, or better, though less 
familiarly, the probability density at 
that outcome, under each of the 
major hypotheses; for these figures 
will permit an immediate judgement 
as to which of the hypotheses is most 
favored by the data. In fact, an even 
more interesting assessment of the 
postexperimental credibilities of the 
hypotheses is then possible through 
use of “likelihood ratios” if one is 
willing to put his pre-experimental 
feelings about their relative likeli- 
hoods into a quantitative estimate. 
For let Pr(H,d), Pr(d,H), and Pr(H) 
be, respectively, the probability of a 
hypothesis H in light of the experi- 
mental data d (added to the informa- 
tion already available), the probabil- 
ity of data d under hypothesis H, and 
the pre-experimental (i.e. a priori) 
probability of H. Then for two alter- 
native hypotheses Ho and J, it fol- 
lows by classical theory that 


Pr(Ho, d) Nis Pr(Ho) Pr(d, Ho) 
Pr(fi,d) Pr(m) Pr(d, Hi) 


up 


2 When the numbers of alternative hypoth- 
eses and possible experimental outcomes are 
transfinite, Pr(d, H) = Pr(H, d) =Pr(H) =0 in 
most cases. If so, the probability ratios in 
Formula 1 are replaced with the correspond- 
ing probability-density ratios. It should be 
mentioned that this formula rather idealisti- 
cally presupposes there to be no doubt about 
the correctness of the background statistical 
assumptions. 
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Therefore, if the experimental report 
includes the probability (or proba- 
bility density) of the data under Ho 
and Hh, respectively, and its reader 
can quantify his feelings about the 
relative pre-experimental merits of 
Hy and Hy (i.e., Pr(Ho)/Pr(ih)), he 
can then determine the judgment 
he should make about the relative 
merits of Hy and Hj in light of these 
new data. 

5. Finally, experimental journals 
should allow the researcher much 
more latitude in publishing his statis- 
tics in whichever form seems most 
insightful, especially those forms 
developed by the modern theory of 
estimates. In particular, the stran- 
glehold that conventional null-hy- 
pothesis significance testing has 
clamped on publication standards 
must be broken. Currently justifiable 
inferential algorithm carries us only 
through computation of conditional 
probabilities; from there, it is for 
everyman’s clinical judgment and 
methodological conscience to see him 
through to a final appraisal. Insist- 
ence that published data must have 
the biases of the NHD method built 
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into the report, thus seducing the un- 
wary reader into a perhaps highly 
inappropriate interpretation of the 
data, is a professional disservice of 
the first magnitude. 


SUMMARY 


The traditional null-hy pothesis sig- 
nificance-test method, more appro- 
priately called ‘‘null-hypothesis de- 
cision [NHD] procedure,” of statis- 
tical analysis is here vigorously €x- 
coriated for its inappropriateness aS 
a method of inference. While a num- 
ber of serious objections to the meth- 
od are raised, its most basic error 
lies in mistaking the aim of a scien- 
tific investigation to be a decision, 
rather than a cognitive evaluation of 
Propositions. It is further argued 
that the proper application of statis 
tics to scientific inference is irrevo- 
cably committed to extensive con- 
sideration of inverse probabilities, 
and to further this end, certain sug- 
gestions are offered, both for the 
development of statistical theory 
and for more illuminating applicatio” 


of statistical analysis to empirica 
data. 
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GLUTAMIC ACID AND HUMAN INTELLIGENCE! 


ALEXANDER W. ASTIN? AND SHERMAN ROSS 
University of Maryland 


3 Glutamic acid administered in 
upranormal quantities has been re- 
Porteg to enhance the intellectual 
Gon ioning of mentally defective pa- 
E a Interest in this relationship 
= s stimulated in part by the work of 
Pe Mabere (1936), who reported 
5 T 1(+) glutamic acid was the 

nly one of 12 amino acids studied 
which was capable of maintaining 
oxygen uptake in sliced brain tissue. 

ince that time a major literature has 
o apieted relating glutamic acid to 

uman intelligence, brain function, 
epilepsy, audiogenic seizures, and 
performance of rodents in mazes. In 
this paper we will review the litera- 
ture which evaluates glutamic acid 
therapy as a means of improving in- 
tellectual functioning in mental de- 
fectives. 

Studies of this problem ordinarily 
employ institutionalized mental de- 
fectives who are placed on a high diet 
of glutamic acid for a specified period 
of time (usually two to six months). 
Psychological tests and/or clinical 
observations of intellectual function- 
ing are made before and after treat- 
ment. 

Inspection of this literature re- 
veals a set of highly conflicting find- 
ings. The picture is somewhat clari- 
fied, however, when the studies are 
classified in terms of (a) whether or 
not positive results were reported, 
and (b) whether or not a control 
group was used. Table 1 summarizes 


1 Now with the Veterans Administration, 
Baltimore, Maryland. 
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33 studies with mentally deficient Ss 
in terms of these two variables. A 
study was classified as positive if sig- 
nificant gains in IQ scores or improve- 
ments in “intellectual functioning” 
were attributed to treatment with 
glutamic acid. Studies employing 
“controls” were liberally designated 
as those in which a similarly diag- 
nosed group was studied in the ab- 
sence of glutamic acid medication. A 
chi square with correction for con- 
tinuity is significant at the .001 level 
(x2 = 12.99), indicating that positive 
results tend to be related to a lack of 
controls. Obviously, the crucial 
studies occur in the Control-Positive 
cell of Table 1. An attempt will be 
made here to evaluate these studies 
according to the following additional 
methodological considerations: (a) 
adequacy of control group; (b) use of 
placebo; (c) control of “environ- 
mental Stimulation,” such that all Ss 
are treated similarly except for drug 
administration; (d) ignorance of Ss 
and Es of the medications or place- 
bos; (e) statistical treatment of data; 
and (f) control of taste differences 
between drug and placebo (some 
forms of glutamic acid have a mark- 
edly bitter taste). 

Albert, Hoch, and Waelsch (1946) 
reported positive clinical results with 
set meut defectives serving as 

own controls. Intelligence test 
performance in general seemed to 
improve during glutamic acid medi- 
cation and drop during placebo ad- 
ministration. However, some tests 
showed both gains and losses during 
acid and placebo periods, and no 
statistical analysis was attempted. 
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TABLE 1 
33 STUDIES CLASSIFIED IN TERMS OF RESULTS AND UsE oF CONTROLS 
Positive Results Negative Results 
Hoch, & Waelsch (1946, 1951 Berguis (1954) S 
Ra f 3 Ellson, Fuller, & Urmston (1950) 
Head (1955) Ernsting (1949) z 
Kurland & Gelash (1953) Kantor & Boyes (1951) 
Zimmerman, Burgmeister, & Putnam | Kerr & Szurek (1950) 5 
(1948) Loeb & Tuddenham (1950) ; 55) 
Control Lombard, Gilbert, & Donofrio (195 
McCulloch (1950) 
Milliken & Standen (1951) 
Oldfelt (1952) 
Quinn & Durling (1950a) 
Zabrenko & Chambers (1952) 
Zublin & Lutz (1953) 
f=6 f=13 
de la Fuente Muniz, Zuniga, & Ya- 
nowsky (1950) 
Delay, Pichot, Puech, & Perse (1951) 
Harney (1950) 
Hoven (1951) 
Kane (1953) 
Levine (1949) 
No Control | Müller (1953, 1954) 
Quinn & Durling (1950b) 
Schwobel (1950, 1952) 
Zimmerman & Burgmeister (1950) 
Zimmerman, Burgmeister, & Putnam 
(1949a, 1949b) 
f=14 f=0 


Moreover, it was not reported if the 
testors had knowledge of the periods 
of medication, and there was no con- 
trol of taste differences between acid 
and placebo. 

The same investigators (1951) re- 
ported a more adequately designed 
follow-up of their first report. Con- 
trols were matched on MA, IQ, age, 
and sex, Significant gains in IQ 
scores obtained during glutamic acid 
medication dropped significantly dur- 
ing Placebo Periods. Testors were 
not informed of the medication sched- 
ules, but ouan testing of the pa- 
tients was more difficult during glu- 
tamic acid periods as later judged by 
the number of times it was recorded 
that a patient had been negativistic, 
distractible, etc.” (p. 487). Obvi- 


ously the testors possessed informa- 
tion which may have enabled them 
to differentiate the control and ex- 
perimental Ss. With regard to the 
taste variable, “... rare questions 
regarding differences in taste were 
explained as due to differences in the 
Strength of the compound admin- 
istered” (p. 475). The potential ef- 
fects of such information upon the 
Parents (who administered the medi- 
cation) and the children provides an 


unknown, hence uninterpretable, 
source of variance. d 
Zimmerman, Burgmeister, an 


Putnam (1948), using 69 mentally de- 
fective children, reported significant 
gains in IQ scores during glutamic 
acid medication. Thirty-seven of 
these Ss served as “controls” in the 
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sense that they had been given an in- 
ei test prior to the pretest for 
oe The interval between 
A two testings, however, ranged 

om six months to eight years. Rec- 
ae the potential inadequacy of 
suc h controls, the authors wrote, 
af 2 abrupt environmental changes 
: git Possibly influence the intelli- 
gence test score. This was not an im- 
Portant factor ... since . . . the daily 
Pattern of their lives was not appre- 
maby changed” (p. 597). In another 
ee they stated, “Overdoses 
ioe „Produce distractibility .. in- 
di E a: [and] occasional gastrıc 
istress” (p, 594). Describing dosage 
Procedures in a preliminary report 
(1947), “Glutamic acid was admin- 
istered... to the point where in- 
creased motor activity was present, 
or where parents complained about 
the distractibility and noncoopera- 
tiveness of the child” (p. 175). It 
certainly appears as though ‘“‘abrupt 
environmental changes” did occur at 
least for some Ss, although no method 
for evaluating their specific contribu- 
tion in this situation was provided. 

A study by Foale (1952) em- 
ployed two groups of 15 mentally 
defective boys equated on age, IQ, 
and length of institutionalization. 
Gains in IQ scores among the treated 
Ss were attributed to glutamic acid 
medication, although no placebos 
were employed for control Ss, no con- 
trol on testor knowledge was re- 
ported, and no statistical tests were 


attempted. 
Kurland and Gelash (1953) 


matched 13 adult male mental defec- 
tives with 13 controls on age and IQ 
scores. Significantly greater gains in 
IQ scores were reported for the ex- 
perimental Ss. No attempts to con- 
trol taste differences were made and 
it was not revealed if the testors knew 
the medication schedules. 

The final and perhaps most care- 
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fully designed Control-Positive study 
was carried out by Head (1955). 
Three groups of 30 children each 
(schizophrenics, mental defectives, 
and normals) were employed in a 
cross-over procedure which lasted 
for three, one-month periods. Sub- 
groups of 10 Ss received different 
combinations and orders of medica- 
tion and placebo. The author’s con- 
clusion of a significantly beneficial 
effect attributable to glutamic acid 
is based primarily on a comparison of 
two subgroups of mental defectives 
before and after their initial month 
in the study. Mean changes in IQ 
scores of 6.0 and 3.0 were reported 
for the experimental and control 
groups, respectively, during this in- 
terval. However, in computing the 
standard error of this mean difference 
(p. 120), Head incorrectly employed 
the standard errors of the two pre- 
and posttest mean differences, in- 
stead of the standard deviations of 
the two distributions of difference 
scores. When appropriate substitu- 
tions are made, the ¢ ratio (df=18) 
drops from 3.52 (p<.01) to 1.12 (p 
>.05). Head controlled taste differ- 
ences, but no control over testor 
knowledge was reported. 

The methodological criticisms of 
these studies do not, of course, render 
the authors’ interpretations invalid; 
but the amount of contradictory ex- 
perimental evidence (Control-Nega- 
tive cell of Table 1) raises consider- 
able doubt. Seven studies (Ellson 
Fuller, & Urmston, 1950; Kantor & 
Boyes, 1951; Kerr & Szurek, 1950; 
Loeb & Tuddenham, 1950; Lom- 
bard, Gilbert, & Donofrio, 1955; 
Milliken & Standen, 1951; Zabrenko 

; 
& Chambers, 1952) controlled both 
testor knowledge and taste differ- 
ences between placebo and glutamic 
acid. Six studies (Ellson, Fuller, & 
Urmston, 1950; Loeb & Tuddenham, 
1950; Lombard, et al., 1955; Milliken 
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& Standen, 1951; Oldfelt, 1952; 
Quinn & Durling, 1950a) employed 
matching procedures in the selection 
of Ss. 
Some specific points of comparison 
can be made between certain Con- 
trol-Positive and Control-Negative 
studies. The negative study by Kan- 
tor and Boyes (1951) was intended as 
replication of the earlier Albert- 
Hoch-Waelsch study (1946). Im- 
provements in the design included 
larger Ns, more homogeneous and 
more reliable diagnoses, control of 
testor knowledge, and control of 
taste differences. Two negative 
studies (Ellson, et al., 1950; Loeb & 
Tuddenham, 1950) attempted to rep- 
licate the positive study by Zimmer- 
man, Burgmeister, and Putnam (1947, 
1948). The main improvement in the 
design of these replications was the 
use of a control group which was 
studied concurrently with the experi- 
mental group. Ellson, Fuller, and 
Urmston reported gains in IQ scores 
for all Ss which were comparable to 
those reported by Zimmerman, Burg- 
meister, and Putnam during the med- 
ication period. Loeb and Tudden- 
ham, in attempting to explain the 
lack of change in IQ scores found dur- 
ing the six-month to eight-year con- 
trol period of the Zimmerman-Burg- 
meister-Putnam study, point out that 
facilitative practice effects on IQ tests 
are less likely to appear over such 
relatively long periods of time. 
Three other Control-Negative 
1949; Quinn & 
brenko & Cham- 


tives to a setting 
“increased attentio 


support” was followed by significant 


increases in IQ scores. 

Himwich (1954) has discussed one 
aspect of this research which has been 
given little attention, that is, the 
form in which glutamic acid is usu- 
ally administered. She points out 
that some studies have employed the 
unneutralized acid, while others have 
used the hydrochloride or salt (usu- 
ally sodium glutamate). Presenting 
data from one S to demonstrate that 
the unneutralized acid does not enter 
the bloodstream as efficiently as the 
hydrochloride or salt, Himwich con- 
cludes, “It is interesting that the ma- 
jority of favorable reports . . . have 
been obtained with sodium glu- 
tamate or the hydrochloride” (p. 352). 
Even though some of the papers re- 
viewed are rather obscure on this 
matter, a check on those listed n 
Table 1 does not support Himwich’s 
observation. As far as could be de- 
termined, all of the studies in the 
Control-Positive cell of Table 1 used 
the unneutralized acid. On the other 
hand, at least three of the Control- 
Negative papers (Ellson, et al., 1950; 
Loeb & Tuddenham, 1950; Milliken 
& Standen, 1951) used sodium glu- 
tamate. 

Rogers and Pelton (1957) recently 
reported positive effects on IQ scores 
using a closely related compound, 
glutamine. The authors used small 
doses for a relatively short time (six 
weeks), and concluded that ‘some 
types of mental deficiency are asso- 
ciated with an impaired ability to 
synthesize glutamine from glutamic 
acid” (p. 88). Although Ss were care- 
fully matched and testor knowledge 
was controlled, the results were equiv- 
ocal (b<.10>.05) and no control of 
taste differences was attempted. 
These workers plan to publish studies 
with larger Ns in the near future. 

Another aspect which has received 
no systematic study is that of side ef- 
fects. Very large doses of glutamic 
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acid can result in flushing (Fincle & 
Reyna, 1958), nausea and vomiting 
(Himwich, Wolff, Hunsuker, & Him- 
wich, 1955), in addition to the dis- 
tractibility, hyperactivity, etc., men- 
tioned earlier. These effects appear 
to be temporary, and disappear with 
the cessation or reduction of medi- 
cation. The importance of such ef- 
fects for the present review is that 
their appearance enables observers to 
differentiate between control and ex- 
perimental Ss. Moreover, the fact 
that experimental Ss who develop 
such side reactions will require addi- 
tional care and attention introduces 
another “environmental stimulation” 
which may contaminate the findings. 
_A final point is that of diet. It is 
likely that the diets of chronic pa- 
tients in certain institutions are in 
many respects deficient. The com- 
plexity of amino acid metabolism 
and the difficulty in defining which 
amino acids are essential for adequate 
nutrition are well known (Woods, 
1950). The effects of glutamic acid 
medication may depend to a large ex- 
tent upon the recent dietary history 
of the population under study. 
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In conclusion, it appears that a 
specific effect of supranormal amounts 
of glutamic acid upon human intelli- 
gence has yet to be convincingly 
demonstrated. The more carefully 
designed studies tend to be negative 
almost without exception. Needless 
to say, the classification of studies in 
Table 1 provides testimony in sup- 
port of the demand for controlled ex- 
perimentation. 


SuMMARY 


A review of the literature relating 
glutamic acid medication to the in- 
tellectual functioning of mental de- 
fectives indicates that positive effects 
tend to be reported in studies not em- 
ploying a control group. The few 
positive studies employing controls 
contain methodological flaws, render- 
ing their conclusions difficult to ac- 
cept. The tendency for negative find- 
ings to occur in the more adequately 
designed experiments sheds doubt on 
the hypothesis that glutamic acid 
medication has a specifically bene- 
ficial effect on intellectual function- 


ing. 
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In his general review of the area 
of psychotherapy in 1946, Snyder 
(1947) expressed optimism and fore- 
saw that this field was at least in the 
early stages of becoming a science. 

e saw as a “commendable trend” 
the fact that the scientific approach 
was being more widely used in the 
Study of all methods of therapy and 
Pointed out that the measurement of 
Outcome was undergoing objectifica- 
tion. Since the time of that paper at 
least 400 studies have been published 
in which some effort was made to 
evaluate theeffects of psychotherapy. 
Despite this extensive research ac- 
tivity, there are some (Eysenck, 
1952) who have questioned whether 
anyone has adequately demon- 
Strated that psychotherapy is effec- 
tive, 

Evaluatory research in psycho- 
therapy is a most complex activity 
but an extremely important one if we 
are to understand more about the 
nature of what can bring about per- 
sonality change. The practical and 
theoretical problems involved in ac- 
quiring Ss, developing meaningful 
controls, and making measurements 
are enormous. Add to these the 
question of what one should measure, 
that is, what criterion should be used, 
and the complexity is increased 
many times over. 

It is the purpose of this paper to 
summarize and evaluate some of the 


1 The authors are grateful to E. L. Cowen 
of the University of Rochester for reading the 
manuscript and offering many pertinent 
criticisms and suggestions. 


approaches which have been used to 
deal with the problem of the criterion. 
This is based largely on an exhaus- 
tive survey of the many experiments, 
proposals for experiments, theoretical 
papers, and some reports of case 
studies involving the evaluation of in- 
dividual or group psychotherapy 
which have appeared in the major 
American psychological and psy- 
chiatric journals between 1946 and 
1959 (Zax & Klein, 1958). In this 
context Snyder's (1947) definition of 
psychotherapy has been adopted 
which rules out studies devoted to 
educational procedures and guidance 
activities emphasizing the giving of 
information, as well as social ac- 
tivities, occupational therapy, shock 
therapy, chemotherapy, etc. 

The present review is divided into 
two major sections devoted to (a) 
criteria based on client behavior in 
the therapy situation or his personal 
report and (b) criteria based on the 
client’s behavior outside of the ther- 
apy situation. The studies cited in 
this paper are selected as being 
illustrative of these two approaches. 
Phenomenological measures and in- 
dices of client behavior within the 
therapy situation have been used in 
some of the major systematic pro- 
grams for evaluating psychotherapy 
(Rogers & Dymond, 1954; Snyder, 
1953). Measures of extratherapeutic 
behavior represent logically a most 
important yardstick. In addition lo 
these two major approaches, psy- 
chological tests have also found fre- 
quent use as criteria, but, because of 
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space limitations, it was felt that they 
might better be reviewed separately. 


INTRATHERAPEUTIC BEHAVIOR AND 
PHENOMENOLOGICAL CRITERIA 


Criteria based on S's self-experi- 
ence and his behavior within the 
therapy situation have stemmed 
largely from the work of the client 
centered group who have actively 
studied their treatment approach, In 
their research Program, they have 
developed a few instruments which 
were directly intended to serve as 
outcome criteria and several indices 
which have important implications 
for outcome, 

Seeman (1954) constructed a meas- 
ure which has found considerable use 
both as a criterion of therapeutic 


Dymond, 1954). 


nine-point scales, Several of which re- 
quired the counselor to evalu 


Many of ¢ ese in- 
are implicit dimen- 


sions of the global judgment of suc- 
cess to which it was compared. 

The other instrument which has 
been used as a criterion in a number 
of studies (Snyder, 1953) was de- 
veloped by Tucker (1953) „who 
termed it the “multiple criterion. 
This involved a Client Post Therapy 
Scale which was essentially a self-as- 
sessment device in which the client 
was asked to rate his feelings toward 
such things as the possibility of 
having problems in the future, the 
status of the problem which brought 
him to treatment, relationship with 
immediate family, sexual adjust- 
ment, relationship to others, etc. 
Another measure as part of the cri- 
terion was the Counselor Post Ther- 
apy Check List which involved 29 
items referring to the client's be- 
havior during therapy and was based 
on a careful review of therapy notes 
and interview recordings, This 
check list was filled out by both the 
therapist and in each case by one 
other of a group of trained raters 
nscribed interview 
Finally, the first and last 
interviews in each case were analyzed 
as to thè number of positive and 
negative emotional Statements made 
by the client and an index derived by 
dividing number of negative state- 
ments by the sum of negative and 
Ppositive ones, 

he client’s self-report which was 
an integral part of Tucker's ‘‘mul- 
tiple criterion” has been used as the 
sole criterion at times and represents 
the most direct phenomenological 
measure of therapy outcome. In- 
vestigations using such measures 
have ranged from those employing 
elaborate rating devices, with some 
effort at standardizing the procedure, 
to ratings based on relatively un- 
Standardized interviews in which the 
S is asked to describe his present 
State or changes which may have oc- 
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curred as the result of therapy. 
Fiedler’s study (1949) serves as an 
example of the former. He had Ss 
fill out a 10-item self-rating scale 
with each item scaled from 0 to 12. 
The items referred to emotional ten- 
Sions related to the stress of taking 
academic examinations and to 
changes as the result of psychother- 
apy. 

In studies using less systematic 
self-evaluative techniques, like that 
of Lipkin (1948), general questions 
have been asked such as, “What 
Seemed to go on during your visits 
here?” “How do things look to you 
now?” Responses were evaluated 
subjectively and the clients’ descrip- 
tions of their experience in therapy 
and its effect on them were seen to 
confirm the expectations of Rogerian 
theory. 

Cowen and Combs (1950) used a 
third approach for eliciting the 
clients’ evaluation of therapeutic 
Progress. They conducted open- 
ended follow-up interviews which 
Were recorded and evaluated by 
three judges as being “successful, 
Progress, or failure” cases. 

Other instruments have been de- 
veloped which elicit self-descriptions 
from the client. While such descrip- 
tions have not been a direct evalua- 
tion of the therapy experience itself, 
they have implications for the effects 
of therapy and have been used as out- 
come measures. In one study, Butler 
and Haigh (1954) used a Q sort in- 
volving 100 self-referent statements 
which had been randomly selected 
from available therapy protocols. Ss 
were required to sort these to de- 
scribe themselves as they were at the 
time on a “‘like-me’’ to “unlike-me” 
continuum. They were further asked 
to make sortings which would de- 
scribe their own ideal on a “like- 
ideal” to ‘‘unlike-ideal” continuum. 
The investigators reported signifi- 
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cant increases in the correlation be- 
tween self and ideal sorts of clients 
who underwent therapy despite the 
fact that the same clients failed to 
show such changes on the same sorts 
made before and after a waiting pe- 
riod prior to the beginning of ther- 
apy. A no-therapy control group also 
failed to demonstrate such changes. 
Cartwright (1957) found a significant 
relationship between success in treat- 
ment as rated by the therapist and an 
increased consistency in the sorting 
of the Butler-Haigh items when three 
self-sorts were made each using ‘differ- 
ent people as interacting reference 
points. 

Rosenthal (1955) constructed a 
Morals Value Q-Sort comprising 100 
statements which the S sorted into 
two piles as being relatively more or 
less descriptive of himself. This was 
administered to the patients in his 
sample before and after treatment 
and the therapists involved also made 
the sort. His findings were that pa- 
tients judged as improved tended to 
revise more values in the direction of 
those of the therapist than the un- 
improved. 

Dymond (1953) selected 74 of the 
Butler and Haigh items which two 
non-client-centered psychologists had 
sorted into two equal piles as being 
characteristic of the well adjusted on 
the one hand and of the poorly ad- 
justed on the other. These in turn 
were given to four other judges who 
sorted them independently in a sim- 
ilar fashion and a high degree of 
agreement was found. Ss were then 
given an adjustment score based on 
how many of either kind of state- 
ment appeared on the “like-me” or 

unlike-me’’ sides of their sortings. 
She found scores on this Q-adjust- 
ment scale, as it was termed, to move 
toward good adjustment following 
therapy (1953, 1954). Cartwright 
and Roth (1957) found the correla- 
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tion of a client’s self and ideal sorts to 
be related to the Q-adjustment sort 
and the client’s self-rating on the 
Willoughby Emotional Maturity 
scale. Although Dymond (1953) had 
not found differences in the Q-adjust- 
ment scores after a two-month inter- 
val during which her Ss were waiting 
to enter therapy and ultimately did, 
Grummon (1954) did find significant 
changes in this type of score among 
Ss who requested treatment but then 
decided against it when it was avail- 
able. In this case a two month inter- 
val had also elapsed between tests. 
Dymond (1955) re-examined the Q 
sorts of Grummon’s Ss and con- 
cluded that 

although positive adjustment changes appear 
to take place in maladjusted persons in the 
absence of psychotherapy, these are not iden- 
tical with the changes which occur in equally 


maladjusted persons who complete therapy 
(p. 107). 


She denied that any “deep” reorgan- 
ization takes place and saw the im- 
provement as characterized by “a 
strengthening of neurotic defenses 
and a denial of the need for help.” 

A number of studies of personality 
change as seen in the therapeutic in- 
teraction have implications for cri- 
teria, especially insofar as these 
changes have often been related to 
direct evaluations of outcome. Sny- 
der (1945), following a Pioneer in- 
vestigation of the therapy process by 
Porter (1943), made the earliest of 
such studies. He classified client 


or accepting or rejectin 
of feeling, responses 
into remedies for a I 
sponses which were unrelated to the 
principal problem of the client. 4 
second dimension for clients’ state- 
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ments was identified as expressions of 
feeling and nine categories were i 
up to classify them. These describe 
attitudes expressed in clients state- 
ments as being positive, negative, or 
ambivalent with reference to the self, 
the counselor, or other persons or 
situations. As a result of his analysis 
of nearly 10,000 client responses 11 
the 48 interviews he used, Snyder 
concluded that there was a marked 
tendency for the client’s feelings to 
change in affective tone from nega- 
tive to positive. Further he noted 
that in his attitude toward the coun- 
selor the patient was slightly reject- 
ing at first, and indifferent during 
most of the treatment; but in the last 
interview or so, there was a marked 
increase in positive attitudes. _ He 
also interpreted his findings as indi- 
cating that “clients approaching the 
end of treatment show an excellent 
amount of insight into the nature of 
their problem.” > 
In another of the early studies of 
personality change with psychother- 
apy, Raimy (1948) was concerned 
with changes in self-concept. He an- 
alyzed client responses in a set of 14 
cases by classifying statements into 
six categories. These involved self- 
references which were positive, nega- 
tive, ambivalent, and ambiguous; 
statements which did not involve 
self-references; and nonrhetorical 
questions. He found that in cases 
considered successfully treated on the 
basis of the judgments of the coun- 
selor, the supervisor of most of the 
cases, and Raimy himself, the client 
went from a preponderance of nega- 
tive and ambivalent self-references tO 
a preponderance of positive self-refer- 
ences. This was taken to support the 
hypothesis that in successful therapy 
a positive change in self-concept took 
place. P 
Several measures of client experi- 
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ence were developed in a series of 
studies of the process of psychother- 
apy ina single sample of 10 cases at 
the University of Chicago. Changes 
in the clients’ experience reflected by 
these measures were found by Raskin 
(1949) to be associated with success 
in therapy as judged by the coun- 
selor. Thus, in the more successful 
cases clients showed an increase in 
acceptance of, and respect for, self as 
measured by a scale developed by 
Sheerer (1949); an increase in posi- 
tive and objective attitudes directed 
toward the self as measured by a 
scale developed by Stock (1949); a 
tendency toward more mature behav- 
ior as judged from the client’s own 
verbalizations in therapy (Hoffman, 
1949): and a decrease in defensive- 
ness as measured by Haigh (1949). 

In a later study of his own, Raskin 
developed a four-step scale, illus- 
trated at each point by three ex- 
amples of client statements, on the 
basis of which judges estimated 
whether the client, in what he said, 
was being governed largely by the ex- 
pectations of others or by his own 
values and standards. Ratings on 
this “locus of evaluation” scale were 
found to correlate significantly with 
therapists’ ratings as to the success 
of treatment and with the five par- 
allel interview measures described in 
the previous study, but not with 
rated change on the Rorschach. 

In a later study of the changes in 
personality in successful psychother- 
apy, seen phenomenologically, Var- 
gas (1954) measured self-awareness 
in three ways and related increase on 
his measures to a number of criteria 
of outcome. He summarized his find- 
ings by saying: 

The conclusion which seems to follow from 
these observations is that the hypothesis— 


increasing self-awareness during therapy cor- 
related with success in therapy—is confirmed 
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when success is measured by instruments 
which rate highly those changes and states 
deducible from client centered theory (p. 165). 


It should be noted that nearly all 
of these studies relating personality 
change in psychotherapy to judg- 
ments of the general outcome of 
treatment involve a certain circular- 
ity. In nearly all cases the judgment 
as to outcome was made by people 
holding theoretical viewpoints sim- 
ilar to those of the researchers who 
developed the scales for measuring 
change. It is, therefore, likely that 
the two measures were not com- 
pletely independent. 

A few measures of changes in cli- 
ents’ verbal behavior within the ther- 
apy interaction have been developed 
outside of the client-centered frame- 
work. One of these was the Discom- 
fort Relief Quotient (henceforth re- 
ferred to as DRQ) which was first 
proposed by Dollard and Mowrer 
(1953). This measure classifies 
words, clauses, or sentences as to 
whether they signify discomfort, re- 
lief from discomfort, or a neutrality 
of emotion. To arrive at the quotient 
the number of discomfort words, 
clauses, or sentences are divided by 
this same number plus the number of 
relief words, phrases or clauses. 
Thus, the quotient may vary from 
zero to one, with scores nearer zero 
representing a preponderance of ex- 
pressions of relief and those ap- 
proaching one indicating considerable 
expression of discomfort. Dollard 
and Mowrer made no claim that the 
DRQ measured ‘‘success’’ in treat- 
ment. To do this they felt that it 
must first be related to a reliable 
measure of “real life success.” 

Several attempts have been made 
to validate the DRQ as a measure of 
success In therapy. Hunt (1949a, 
1949b) applied it in a social casework 
setting and found that changes in 
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DRQ failed to correlate significantly 
with judgments of improvement 
made by case workers. Other studies 
(Assum & Levy, 1948; Cofer & 
Chance, 1950; Murray, Auld, & 
White, 1954) reported analyses of the 
published protocols of cases pre- 
sented by the therapist as successful, 
with two finding the predicted change 
and the third finding no relationship. 
Kauffman and Raimy (1949) de- 
rived a related measure from Raimy’s 
self-concept categories (described 
above). It consisted of the number of 
negative self-references plus the num- 
ber of ambivalent self-references di- 
vided by the number of negative self- 
references plus the number of ambiv- 
alent self-references plus the num- 
ber of positive self-references (more 
conveniently termed the PNAvQ). 
Using this quotient, they analyzed 17 
verbatim interviews and compared 
their analysis with an analysis of the 
same protocols using the DRQ. They 
concluded that both methods traced 
changes from maladjustment to ad- 
Justment in a similar fashion. They 
also noted that PNAvQ judgments 
were obtained in about one-third the 
time required for DRQ judgments. 
nother study of the nature of 

ently re- 
Ported by Berg (1958). He ahaha 
an eight-interview Protocol of a case 
published by Rog 
and proposed tha 


Tequency count s 
words (I, me X at 


words and 


“expletive-bombastic Sounds” at var- 


ious points in 


I > Negative, and 
expletive-bombastic expressions de- 


creased with succeeding interviews 
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Most recently, Rogers (1958) has 
developed and given a preliminary 
report on a scale of process levels in 
psychotherapy which bears consid- 
erable significance for the measure- 
ment of the effects of successful 
psychotherapy. Again, his goal was 
a further understanding of the nature 
of change in personality from a the- 
oretical framework rather than meas- 
urement of outcome. He conceived 
that clients move “not from fixity or 
homeostasis through change to a new 
fixity . . . but much the more signifi- 
cant continuum is from fixity to 
changingness, , . ae He hypothe- 
sized that the nature of clients’ im- 
mediate relationship to their feelings 
at any point in the therapeutic inter- 
action might indicate their position 
on a seven-stage continuum. 


BEHAVIORAL CRITERIA 


In many instances, studies of the 
results of psychotherapy have used 
criteria which depend on an evalua- 
tion of the way the patient actually 
behaves without inference as to its 
personal meaning for him. Such in- 
dices were generally developed di- 
rectly as criteria for use in a given 
Situation and were not related to a 
theoretical framework about per- 
sonality change, 

Of the many studies which have 
used behavioral criteria, certain ones 
have been particularly noteworthy 
in that they dealt with crucial as- 
pects of behavior which can be objec- 
tively established. The simplest of 
such criteria focused ‘on relatively 
circumscribed individual behaviors 
which were seen to be central to the 
Person’s difficulty in living. The more 
complex criteria attempted to assess 
wide, more representative areas O 
functioning through the use of elab- 
orate rating scales. 

A study by Friedman (1950) is 
typical of those employing criteria 
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i delimited behaviors 
iliy sa Soa i the person's diffi- 
tients com lai toy ae wee ae 
travel,” v hich ca J phobie 7 
in = can be objectively 
Dies anime valuation was based on 
and ie > y z travel after treatment 
Tre as found that 12 patients 
it improved, 15 showed some 

i rit and 23 were com- 
Peed recovered. Another example 
Ta study utilizing a single symptom 
a bore implications for a much 
Teubs range of behavior was that of 
sima = and Powers (1951). They 

ply totaled the number of court 
appearances among a large group of 
— juvenile delinquents who 
ad received treatment and made 
comparisons with a matched control 
group which had received no treat- 
ene. No significant differences were 
ps between groups on this meas- 

e. 

A variation in the use of an im- 
Portant individual behavior as a cri- 
terion was introduced by Thetford 
(1952) who derived an autonomic 
measure of frustration tolerance. 
This study stands out in that the be- 
havior which was measured was not a 
specific complaint but depended on 
the theoretical consideration that 
therapy should reduce anxiety and 
tension so that the manner in which 
one responds to stress as reflected in 
the autonomic nervous system should 
be altered. He developed a “‘Recov- 
ery Quotient” based on various Gal- 
vanic Skin Response measures and 
found significant changes as the re- 
sult of psychotherapy which indi- 
cated the development of a higher 
frustration threshold. 

The criterion used by Pascal and 
Zax (1956) likewise involved objec- 
tive behavioral measures, but these 
varied with the individual patient, 
reflecting presenting complaints. 
These complaints were evaluated for 


30 cases which had undergone various 
types of treatment. In 28 of these, 
changes in the predicted direction 
were found. 

Institutional settings have made 
it possible to study wider samples of 
behavior objectively. In such set- 
tings Cowden, Zax, Hague and Fin- 
ney (1956); Fox (1954); and Ludwig 
and Ranson (1947) have used multi- 
ple but individually significant be- 
haviors as their criteria. Cowden et 
al. (1954) considered the number of 
times hospitalized patients required 
neutral wet packs, electroconvulsive 
maintenance shock, or engaged in 
fights, in addition to such indications 
of improvement as transfer to a ward 
requiring a higher level of integration 
or discharge from the hospital. They 
concluded that patients who received 
group psychotherapy in addition to 
tranquillizing drugs showed more 
improvement than various control 
groups. To evaluate the effects of 
counselling programs in a prison, Fox 
(1954) used such behavioral criteria 
as work stability, school stability, 
financial budgeting, reports from 
chaplain, block officers, and work 
supervisors, successful discharge from 
parole, and return to prison as a pa- 
role violator; he found counseled 
groups had significantly higher ad- 
justment scores on such indices than 
similar uncounseled groups. In a re- 
port of results of psychiatric treat- 
ment among soldiers, Ludwig and 
Ranson (1947) reported that rela- 
Ade na 
tions and chat P oas 
ing officers ideate OE con 
them were abl ed that mo ga 

: e to perform their 
services adequately, 
Lo. have made use of 
scales which spr elaborate aarme 
extrathera s tempted TO ASRA the 
xtr py functioning of the in- 
dividual on the basis of diverse be- 
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havioral observations. One of the 
older instruments of this type which 
was used in the evaluation of treat- 
ment with children (Gersten, 1951; 
Mehlman, 1953) is the Haggerty- 
Olson-Wickman Behavior Rating 
Schedules (Jones, 1941). This con- 
sists of two separate schedules (A and 
B) the first of which (A) lists 15 prob- 
lems such as cheating, lying, defiance 
of discipline, speech difficulties, sex 
offenses, obscene notes, talk, or pic- 
tures, etc. Raters checked in one of 
four columns according to the fre- 
quency of occurrence of each for a 
given individual. Standardized weights 
were assigned according to the fre- 
quency and seriousness of a given 
problem. The other schedule (B) 
comprised a series of 35 graphic five- 
point rating scales covering traits 
which may be classified according to 
intellectual, physical, social, and emo- 
tional traits. On the basis of ratings 
made before and after group ther- 
apy with juvenile delinquents, Ger- 
sten (1951) reported Progress in emo- 
tional security and social maturity 
among his subjects, Mehlman (1953), 
who used the scale to rate mentally 
retarded children before and after 
group therapy, found significant in- 
creases in adjustment at the time of 
the second rating. 
Of the many devices which have 
been used to evaluate change in 
hospitalized patients, perhaps the 
Most promising and certainly the 
Most searching are the Palo Alto 
Hospital Adjustment Scale (McReyn- 
olds & Ferguson, 1953) and the Lorr 
Multidimensional Scale (Lorr, 1953). 
The Palo Alto scale consists of 90 
descriptive Statements applicable to 
psychiatric Patients, Examples of 
these statements are, “the patient ig- 
nores the activities around him” or 
“the patient's talk is Mostly not sen- 
sible.” Each one is marked as true, 
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not true, or does not apply, for a 
particular patient and is keyed. A 
such a manner that a general hospita 
adjustment score can be obtained 
The scale was designed to be fille 
out at intervals by ward personnel 
who are familiar with the patient's 
behavior. On this measure, semo 
phrenics were seen to improve Y 
group psychotherapy (Semon & Gold . 
stein, 1957). In another study (Wil- 
cox & Guthrie, 1957) items from this 
scale were combined with others sug- 
gested by personnel in an institution 
for defective children, and by this 
index group therapy was found to be 
effective. . 
The Lorr scale consists of 62 brief 
rating scales which are directed to- 
ward observable or inferable patient 
behavior. Many of the items refer fo 
relatively objective behaviors con- 
cerning which judgments should be 
quite reliable, such as bizarre pos- 
tures, speech peculiarities, orienta- 
tion, eating, sleeping, assaultiveness: 
On the other hand many other items 
refer to aspects of behavior which are 
probably legs reliably rated such as 
emotional responsiveness, attitude 
toward himself, suspiciousness, re- 
currence of useless thoughts, etc. The 
use of this scale was reported in @ 
study with long term schizophrenic 
patients who were seen by this meas- 
ture to have improved significantly 
more than a control group (Funk, 
Shatin, Freed, & Rockmore, 1955). 
The scales used to evaluate out- 
patients have as a rule been more dif- 
ficult to apply and often have been 
more complex. This is due to the ob- 
vious fact that the behavior of the 
nonhospitalized patient is less limited 
by the structured aspects of institu- 
tional life so that he functions in 4 
much wider range. Observation is 
thereby also made more difficult. 
Hunt (1949a, 1949b) has attempted 
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o “movement” in social 
= a by developing a criterion 
as sy the DRQ. Movement 
i e as the change which ap- 
Me eat an individual client and/or 
the ironmental situation between 
hers ai closing of his case” 
set up in ahs p. fO His ong as 
biti Gan, t ps ranging rom 
wis and, hrough zero, to plus four 
tf the nchoring illustrations at each 
a se three points. It was found 
the th Ser ik i workers could use 
ak a e reliably, but no relationship 

RQ mna between movement and 

Th changes in the course of therapy- 
PR ae eae oe Emotional Ma- 

: y Scale (1931) has been used by 
in gers (1954) to evaluate changes in 
hanna Jt consists of 60 
ie wy descriptive of varying 
The a a maturity of functioning. 
100 evels had been determined by 
Fal aldes who sorted a large 
der er of statements along a mine- 
1 P continuum. The 60 items se- 
ected for the scale were representa- 
tive of the nine levels of maturity 
and were ones on which there was 
high agreement among judges. In 
Rogers’ study, each client was rated 
by himself and two personal friends 
whom he designated. Although intra- 
rater reliability was high, interrater 
reliabilities were all low. Conceiv- 
ably, this scale might have higher 
reliability in the hands of trained ob- 
servers although this might limit its 
use to a somewhat standardized 
setting such as a dormitory or school 
setting. 

Miles, Barrabce, and Finesinger 
(1951) developed a series of five- 
point scales covering the general 
areas of (a) symptoms; (b) social ad- 
justment including functioning in the 
areas of occupation, marriage, inter- 
personal relations, and sex; (c) in- 
sight; and (d) life situation since hos- 


pitalization. As a group, the scales 
were comprehensive and individual 
steps were well described. On the 
basis of these instruments, overall 
evaluations were made of patients 
and summarized in the categories 
“apparently recovered, much im- 
proved, improved, slightly improved, 
unimproved, and worse.” In using 
this measure to assess & group of 62 
cases two years after treatment, they 
found that 58% had improved in 
varying degrees while 42% were un- 
changed. Imber, Frank, Nash, Stone, 
and Gliedman (1957) derived a Social 
Ineffectiveness score on the basis of a 
series of six-point scales which ap- 
plied to each of 15 behavioral cate- 
gories concerning the patient’s rela- 
tionships with the significant indi- 
viduals in his life (spouse, sibs, chil- 
dren, parents, boss, etc.). Some of 
the categories were overly independ- 
ent, withdrawn, superficially socia- 
ble, extrapunitive, officious, impul- 
sive, etc. Using this scale they in- 
vestigated the relationship between 
improvement and amount of thera- 
peutic contact, and they found less 
improvement for patients with re- 
stricted therapy contacts than for 
those with more frequent ones. 
Raush, Dittman, and Taylor (1959) 
have made a recent contribution to 
the methodology of making observa- 
tions and developing behavioral cri- 
teria to assess change with treatment. 
Working in a residential treatment 
setting for children, they standard- 
ized their observations of six male Ss 
and systematically studied samples 
of their behavior in a variety of set- 
tings including mealtimes, play pe- 
riods, and an arts-and-crafts pe- 
riod. One set of observations was 
made early in the children’s stay at 
the center and another 18 months 
later focusing on interpersonal be- 
havior at these two points in time. 
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Objective observations were recorded 
and later rated on a scale based on 
two polar coordinates: love (affiliate, 
act friendly) to hate (attack, act un- 
friendly) and dominate (command, 
high status action) to submit (obey, 
low status action). More striking 
changes were found in the relation- 
ships of these children to adults than 
in their relationships to their peers. 


Discussion 


As is the case with any measure of 
personality, a criterion for evaluat- 
ing the effects of psychotherapy must 
satisfy the requirements of reliability 
and validity. The latter usually poses 
the more serious problem in that no 
absolute state of complete validity 
exists as a standard. In dealing with 
this problem we generally conclude 
that a given measure is valid for cer- 
tain specified purposes and not nec- 
essarily valid for others, Therefore, 
we may have a variety of “valid” 
measures of the outcome of Psycho- 
therapy. The judgment of whether 
these are useful measures, however, 
must be based upon our evaluation of 
the purposes for which they are valid. 
The criteria which have been re- 
viewed will be considered in the light 
of such issues, 

„Perhaps the simplest and most 
direct means of assessing a client’s 
Progress in treatment is toask him to 
evaluate his own Status. Such a 
Phenomenological approach has often 


$ 5 unconscious 
distortions; finally, the client’s evalu- 
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ation of his condition may be affected 
by conscious or semiconscious mo- 
tives. In positing the “‘hello-good- 
bye” effect, Hathaway (1948) has 
warned of the subtle social influ- 
ences which limit the reliability of 
many of the phenomenological meas- 
ures which have been made. On en- 
tering treatment the client is under 
the conventional pressure to justify 
his appeal for help so that problems 
are discussed freely. When seeking 
to terminate, however, he feels an 
obligation, out of courtesy toward 
one who has attempted to help, to ex- 
press gratitude and satisfaction. 
fundamental weakness of the phe- 
nomenological approach would, 
therefore, seem to reside in the diffi- 
culty in obtaining reliable assess- 
ments. It seems likely that the con- 
tent of such assessments depends 
greatly upon who asks for it and the 
circumstances under which it is re- 
quested. 

Intratherapy behavior, usually ver- 
bal behavior, lends itself to measure- 
ment and has been used often as 2 
criterion. In many of the studies re- 
porting the use of such criteria a 
single theoretical system, that of 

ogers, has guided the expectations 
of researchers, As a result many of 
these studies relate to each other in 4 
more systematic fashion than is usu- 
ally the case with outcome studies- 
The aspects of verbal behavior which 
have been studied by the client-cen- 
tered group have usually been care- 
fully defined and found to be reliably 
measured. Designed to explore per- 
sonality changes during psychother- 
apy rather than to be evaluators of 
psychotherapy, their significance for 
outcome measures is mostly by impli- 
cation for they remain unvalidated, 
not yet having been compared to an 
independent criterion. Used for the 
Purpose of exploring changes, they 


Es 
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Be compared in the published 
pone a to a judgmental crite- 
tah. he therapist who shared the 
3 eoretical point of view as the 
esearcher and whose global judg- 
ment could have included the con- 
cept under study. 
ne hose intratherapy criteria which 
en a stemmed from the work of 
E ient centered group have found 
A ively infrequent use and the one 
= a to relate change in DRQ to 
Hunt mee external criterion 
eae 949a, 1949b) resulted in an 
maa correlation. 
eee most serious failing at this 
in the use of phenomenological 
sp ae and measures of intrather- 
ia y chavior as criteria of outcome 
lo hat neither has yet been related 
: everyday, externally observable 
haviors in the life space of the Ss. 
ee phenomenological changes and 
to a in verbal behavior in therapy 
a e related to concomitant be- 
ne oral changes in the family and 
a? community their significance Te- 
ns unclear. 
nel measures of clients’ be- 
a stand out as potential criteria 
are ing validity for purposes which 
e aremely important. However, 
NA attempts to use such cri- 
A e is beset by a host of measure- 
iff problems which are much more 
wie to resolve than is the case 
ther phenomenological and intra- 
x q indices. The central prob- 
ten ere is the development of cri- 
ing of sufficient breadth that they 
Si ingih and representative 
ah wide range of functioning ê? 
hes the same time, circumscribe 
ity, gh to be measured with reliabil- 
indie’ Present review would seem to 
Sitch e that the development © 
ane Criteria is in the stage ‘or I 
°y. Many workers have been 


able to reliably observe narrow as- 
pects of functioning which had impli- 
cations for a wider range of behavior. 
In such cases, however, the possibil- 
ity remains that one circumscribed 
symptom was abandoned in favor of 
another which was equally or even 
more disabling. The assessment of 
broader areas of functioning has been 
carried on primarily within the con- 
fines of institutional settings where 
the patients’ range of functioning is 
limited. Perhaps the most glaring 
weakness in the way such criteria 
have been developed and applied is 
that there has been no unifying set 
of principles to guide observations. 
Consequently, the results which have 
been reported are fragmented. We 
are told of a variety of behavioral 
changes which take place as the re- 
sult of therapy but very few of these 
appear in any one study and even 
fewer are observed in more than one 
study. It would seem that the pres- 
ent need is for the development of a 
theory or even a set of loose hypo- 


thetical notions about “normal” be- 
observations and 


havior to guide our 
systematize our thinking. 

It seems likely that one of the ob- 
stacles to the development of such a 
theory has been the reluctance of 
many psychologists to become em- 
broiled in the philosophical issues of 
the desirability of different behav- 
iors. Actually, the problem of mak- 
ing value judgments when one con- 
ducts research cannot be avoided. 
The very selection of the phenomena 
which will be observed and measured 
is in itself a judgment depending upon 
the values one holds. Indeed then, 
the further development of criteria 
for evaluating the effects of psycho- 
therapy awaits the clarification, Te 
solving, and communication of the 
values we hold. 

One approach to the development 
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of a systematic set of values which 
may clarify our thinking about what 
behavior is generally considered “‘psy- 
chologically desirable” would be to 
formalize the notion of the client's 
relationship to social norms which 
was discussed by Pascal and Zax 
(1956). Their concern was with the 
behaviors on the part of the person 
presenting himself for treatment 
which were notably deviant from ex- 
pected social norms (i.e. overt homo- 
sexual acts, frequent crying spells, 
few friends) and the extent to which 
such behaviors were changed. Other 
writers have suggested that the 
clinician does generally function with 
a concern for such social norms. As 
the result of his work in the area of 
personality assessment, Edwards 
(1957) has suggested that the notions 
of the clinician about what consti- 
tutes disturbance in patients may 
correspond essentially to an opera- 
tional definition of what is socially 
undesirable. Cowen (in press) who 
was investigating the social desirabil- 
ity variable in personality assessment 
actually provided data which lends 
Support to this idea. He found a cor- 
relation of —.917 between the pub- 
lished ratings of a group of clinicians 
on 77-trait descriptive terms scaled 
for abnormality and the social desir- 
ability ratings of the same 


terms by 

67 undergraduate students of psy- 
chology, 
This 


are characteristi- 
the light of such 
he functions in 
e felt experience 
or discomfort is 


cally considered in 


the same way that th 
of physical comfort 


evaluated on the basis of various 


measures of bodily function. i 

While this approach, which is 
probably implicit in the thinking and 
functioning of most clinicians, may 
provide a useful beginning to the de- 
velopment of criteria of what therapy 
should accomplish, it is unlikely that 
any single set of norms would apply 
to all. In essence, we are proposing 
that there are, contentwise, many 
“normal” or “healthy” personalities. 
That which is common to each is the 
ability to function in relation to the 
norms of his particular social setting. 
The uniqueness of each individual's 
social setting makes this a complex 
area of study and is undoubtedly dis- 
couraging. It may well develop, 
however, that what people have in 
common is important enough to per- 
mit the development of a relatively 
limited number of norms reflecting 
basic interpersonal environments 
which can be useful. At any rate, It 
would seem that what is now needed 
1s a series of broad normative studies 
of a personal-social psychological na- 
ture. In addition to providing norms 
which can be used as a foundation for 
behavioral criteria of “normality, 
they would provide a basis for de- 
termining just which dimensions of 
social group membership have signif- 
Icance for actual functioning. The 
availability of a criterion based on 
such indices would also provide a 
context in which to evaluate the sig- 
nificance of changes in the experienc- 
ing of Ss, either reported directly of 
reflected in their intratherapy verbal 
behavior, Ultimately, a combined 
measure of related changes in ob- 
served behavior and experiencing 
might facilitate a common, com- 
municative frame of reference among 
workers of different orientations and 
be a basis for delineation of dimen- 
sions of personality change. 
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Tos ae review of the literature 
iim (hn to retroactive inhibi- 
ono ) was Swenson’s (1941) 
hro ph whose coverage extended 
ns 1940. The present paper ex- 
full en coverage by presenting a 
of all iography and critical analysis 
eee oe reports on the RI and 
lear ive inhibition (PI) of verbal 
Stu aan from 1941 through 1959. 
verbal of infrahuman Ss and of non- 
a Pehavior were excluded be- 
the Pa considerations of length and 
ted ct that, traditionally, RI is a 
oe associated with verbal be- 
using i Excluded also were studies 
Shes interpolated convulsive sel- 
such = surgical procedures because 
iffere reatments are qualitatively 
äs er from intervening learning 
ethin and require other theoretical 
Willow to explain their effects. 
eld tg: a brief summary of the 
feds 1940, subsequent develop- 
Peace te be discussed under five 
tion, Si headings: Degree of Acquis!- 
tring imilarity of Materials, Ex- 
ioe Factors, Temporal Effects, 
E Theoretical Positions. 

Üi ioa ninani theoretical position 
its full Was a transfer theory, given 
nd ROS exposition by McGeoch 
heor collaborators. In essence the 
Y stated that RI could be ex- 


1 Thi 
Sra, : 
Ci 


S rk 
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ieee ee author from the National 
oundation (G-6192). 
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plained by the general principles dis- 
covered in the study of transfer. The 
failure of performance of an old asso- 
ciation could be attributed to greater 
strength of the new association, a 
mutual blocking of old and new asso- 
ciations, or a confusion between the 
two. 

This theory was capable of handling 
a great deal of the relevant data and 
depended largely upon two sources of 
evidence for empirical support. The 
first source was the evidence for the 
effect of similarity of materials upon 
RI, which supported the contention 
that RI could be explained by the 
principles of transfer. The second 
intrusion errors, which are 


source was 1 
responses from the interpolated learn- 


ing offered by Ss when they are asked 
for responses from the original learn- 
ing. The existence of these errors sup- 
ported the contention that old re- 


sponses were not given because new 


ones had supplanted them. 

Much of the subsequent history of 
RI can be viewed as a process of ex- 
tension andenlargementof McGeoch’s 
basic position. The four major theo- 
ries discussed later on in this paper 
serve as leading examples. The Mel- 
ton-Irwin two-factor theory enlarged 
the competition of response theory by 
postulating an unlearning process in 
addition to competition of response. 
Gibson elaborated the theory by 
placing it within the setting of the 
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conditioning experiment, making a- 
vailable the conceptual apparatus 
of differentiation and generalization. 
Underwood’s work has concentrated 
upon clarifying the nature of both 
unlearning and differentiation, while 
Osgood has stressed the communality 
of transfer and RI in his “transfer 
and retroaction surface.” 

A consideration of terms is now in 
order. RI is the decrement in reten- 
tion attributable to interpolated 
learning (McGeoch & Irion, 1952), 
and the operations that define it re- 
quire a comparison of the retention 
of some original learning (OL) be- 
tween two groups that differ in some 
aspect of the interpolated activity 
(IL) (Underwood, 1949a). The ex- 
perimental group has IL, and the 
control group engages in some non- 
learning filler task. Better retention 
in the control group defines RI, and 
better retention in the experimental 
group defines retroactive facilitation. 
Since the control group almost al- 
ways shows some loss of the OL 
after its “rest activity,” to what can 
the decrement be attributed: to 
incidental learning, to loss of set, to 
sheer metabolic activity (Shaklee & 
Jones, 1959)? The impossibility of 
assuring that no interpolated learn- 
ing takes place for tha 
duces an inevit 
significance of 
control group 
times assumed 
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which experimentally induced RI is 
calculated and renders comparison 
of results difficult. Osgood (1946, 
1948) has dealt with the problem by 
simply omitting the control group 
and regarding RI as the difference in 
performance between the end of OL 
and the subsequent OL relearning 
(RL), lumping together both the 
specific and nonspecific decrementa 
variables operating during the inter- 
polated interval. This, of course, is & 
measure of total forgetting. Such a 
straightforward procedure cannot, 
however, distinguish between RI and 
retroactive facilitation, as they are 
usually understood, since facilitation 
may involve simply less decrement a 
retention as compared to a contro 
group. Another troublesome piel 
lem arises with the other methods 0 
quantifying RI, both of which rely 
upon control groups. Absolute RI 1s 
simply the numerical difference be- 
tween the retention of the contro 
and experimental groups, and rela- 


tive RI is the percentage difference 
between them: 


Rest-WorkX 100 
= ae 
Rest 


Each of these measures is thus dually 
dependent upon both the experi- 
mental and the control groups’ Pe" 
formance, and they may not always 
give the same pattern of results. This 
Problem becomes especially impey 
tant in studies of degree of OL upo! 
RI. It is often the case that as i 
increases, absolute RI increases, ÞU 

relative RI decreases (Postman = 
Riley, 1959). To illustrate, it can b 

seen that, when degree of OL is 10W» 
the control group’s retention is ee 
and even slight departures from thig 
baseline on the part of the experi- 
mental group will represent a sub- 
stantial percentage difference; where 
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ee the control’s recall is high, 
i absolute difference will re- 
the k esser percentage change, and 
hile ay RI will have decreased, 
the me solute RI will have remained 
een At present, we can only be 
Fi ie this source of confusion 
eth ke it into account when view- 
aa T results of any RI study. The 
fully igs observations apply just as 
A o the quantification of PI, to 
The we now turn. 
= ame paradigm requires a com- 
nal tie the retention of some origi- 
k oups Sus (List 2) _ between two 
of the hat differ only in some aspect 
a oy preceding that learn- 
ee ne experimental group learns 
the She material (List 1), and 
Droblen rol group does not. The same 
group's with regard to the control 
ee ea experience applies here. Bet- 
eine ie in the control group 
s; A I, ãnd better retention in 
e S beter group defines pro- 
ae facilitation. In addition, the 
oral dig requires that a clear tem- 
ane ed pee be made between 
ie of the acquisition phase of 
test, Meee its subsequent retention 
onger Re a retention interval 
is, Heed, Ak the OL intertrial interval 
Sothine . If this is not done, the 
ce g and retention phases would 
P Eee anally identical, and the 
FOR ae would be indistinguishable 
e transfer design. 


DEGREE or ACQUISITION 


S i : 
Wenson’s (1941) generalizations 


a Ou é 6 
as fi t the acquisition variables were 
ollows: 


la]. 
fideo qe P ee: to retroaction does not 
activity paea the amount of original 
atenha ce ee 17). [b]. the 
activity oa of learning of th riginal 
Citta = eee the learning 
May retai ive inhibition (p. 18). [co]. -- we 
ìnhibitio; a the idea of increased retroactive 
n with increased amount of inter- 
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polated activity (p. 19). [d] All measures 
show an increase in retroactive inhibition with 
early increases in the degree of interpolated 
learning and a decrease in retroactive inhibi- 
tion with very high degrees of interpolated 
learning (p. 20). 


These conclusions have been further 
amplified through subsequent work. 
(Unless otherwise noted, the results 
cited below refer to measures at re- 
call—first relearning trial.) 

Several papers have reported the 
effect of degree of IL upon RI either 
by varying the number of IL trials 
(Briggs, 1957; Highland, 1949; 
Melton, 1941; Postman & Riley, 
1959; Slamecka, 1959, 1960a; Thune 
& Underwood, 1943; Underwood 
1945, 1950b), by setting a perform- 
ance criterion (Archer & Under- 
wood, 1951; Osgood, 1948; Richard- 
son, 1956), by varying the number of 
interfering lists (Underwood, 1945), 
or by analysis of the associative 
strength of any single IL list item 
(Runquist, 1957). 

Most of the papers agreed that RI 
of recall showed a negatively acceler- 
ated increase with increasing IL, and 
studies that carried IL to very high 
degrees also agreed that the curve 
tended to flatten out or even to de- 
crease (Briggs, 1957; Thune & Under- 
wood, 1943; Underwood, 1945). In 
], maximum levels of RI were 
obtained when the IL practice had 
somewhat exceeded the OL practice 
and further IL trials did not serve to 
increase the RI appreciably. An ex- 
ception to this was Runquist’s (1957) 
finding that RI of individual items 
was not a function of the strength of 
the corresponding interpolated items. 
Also, in Exp. B of Underwood's 
(1945) report, there were no signifi- 
cant recall differences among the 
work groups, nor was there any con- 
sistent trend toward a negatively 
accelerated curve of recall as a func- 


genera 
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tion of degree of IL. A. possible 
explanation for this may lie in the 
fact that the lowest IL degree (8 
trials) exceeded the mean OL trials 
(which averaged about 6). Under 
these conditions it might well be ex- 
pected that increasing the IL prac- 
tice would have no further decre- 
mental effect. Increasing the IL 
levels did, however, produce faster 
RI dissipation, which gives marginal 
support to Underwood's differentia- 
tion hypothesis. The question of 
whether degree of IL, measured by 
trials, or amount of IL, measured by 
the number of different interpolated 
lists given, is the more powerful 
variable in producing RI was also 
specifically tested by Underwood 
(1945). Care was taken to equate the 
amount and degree levels by equal 
total trials, and the findings showed 
that RI changed at a faster rate with 
increases in amount than with in- 
creases in degree of IL, Both relative 
and absolute RI grew steadily as the 
number of IL lists was increased, but 
the frequency of Overt interlist in- 
trusions remained relatively constant, 
regardless of the number of lists. This 
is also consistent with the differentia- 
tion hypothesis, since increasing the 
number of lists should not increase 
differentiation, whereas increasing 
the number of trials on a single list 
should increase it. It is urged that a 
further comparison of the effect of 
amount against degree of IL should 
be made, using yet lower IL levels, 
SO as to fill out that Part of the curve 
at which acquisition js very slight. 
Degree of OL Was controlled in 
the following studies by varying the 
number of trials (Brig 
ton, 1941; Post 
Shaw, 1942; S] 
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reports agreed that the ae. 
of the original material to RI ee 
inversely related to its level of mi 
quisition. The well-designed er 
study by Briggs (1957), using 420 
OL and five IL levels (2, 5, 10, an if 
trials OL, compared to 0, 2, 5, j 
and 20 trials IL, all paired acipa 
tives), confirmed previous findings 
as well as showing that, as OL = 
creases, the greater must the IL leve 
be for maximal relative RI. This was 
also found by Melton (1941). i 
ther, Briggs reported more significan 
recall differences across the ie 
IL levels as degree of OL increase : 
There was no additional information 
concerning the effects of amount 0 
OL within this period. ioj- 
PI as a function of List 1 aem 
tion has been studied by varying th 
number of trials (Postman & Riley, 
1959; Waters, 1942), the number s 
lists (Underwood, 1945), erting 
performance criterion (Atwater, 19 3 
Underwood, 1949b, 1950a), and ana 
lyzing individual item strengths (Run 
quist, 1957). Two other studies 
(Greenberg & Underwood, 1950; 
Werner, 1947) omitted control groups 
and are not strictly PI designs, and a 
third (Peixotto, 1947) did not distin- 
guish between learning and retention 
measures. When significant PI he 
recall was obtained, all but one of t 
studies agreed that it was a positiv 
function of the degree or amount z 
prior learning, and there was Ti 
some indication that it leveled off ¢ 
high degrees of such learning, mo 
as with RI (Atwater, 1953). The on 
exception (Runquist, 1957) soe 
that PI was not influenced by ne 
degree of the corresponding fener 
ing item strength. The latter is t ‘a 
only sttidy that solely used such en 
ysis and poses an important e 
separate question concerning aa 
variables determining the retention 


\ 
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oe. ‘items per se. Underwood 
one found that PI was eliminated 
recall = aig of prior learning when 
interv: | was extended to 8-sec. 
(1943), s. McGeoch and Underwood 
found a eae paired-associates lists, 
ei nat, when the pairs were pre- 
the o in fixed order, thus providing 
Pinte eet pei for serial learning, 
Eo cant PI was no longer obtained, 
a na to Ee usual method u 
A fact he order of the pairs is varied. 
of PI aen ee nance of the sensitivity 
given o slight procedural changes was 
cant Pi in report that found signifi- 
ore in a serial list at a 2-sec. rate 
Žal sentation, but not at a 2.3-sec. 
O (Underwood, 1941). 
up “of chronic problem which crops 
[karni studies of the degree of prior 
eet upon PI (and also in RI 
arena) is that of controlling for 
ditio ice and warm-up effects. Tra- 
e the control group learns 
ou ist 2, whereas the experimental 
Viet 1 has had prior practice via 
criteric Taking List 2 to a common 
steen ne does not insure equal 
of ae 1S of learning since the rates 
ihe quisition may differ. Although 
eg has been recognize 
dealt coch & Irion, 1952), it is not 
i With in most PI studies. 
mentis (1955) is the only expert 
the effort at such control, wherein 
Ne serge was carried to a seven- 
ext = criterion on the hypothetical 
Dil rial, as determined by previous 
3 study data. 
of ihe pnd study of Plasa function 
Peared degree of List 2 Jearning ap- 
y P in the extensive investigation 
used ostman and Riley (1959) who 
s. Tl nonsense lists and naive 
curvilin. part of their work revealed a 
relati near PI (both absolute. and 
s me he: function. Maximum PI was 
— at the lowest and highest 
ces of List 2 acquisition (5 and 40 


trials, respectively) across all levels 
of List 1 training given (5, 10, 20, and 
40 trials). Runquist (1957) found 
that the degree of PI-of any individ- 
ual list item is unaffected by the 
acquisition strength of that item— 
again pointing up the discrepancy 
between single item retention and 
overall list retention. The study of 
PI has not kept pace with the grow- 
ing knowledge about RI, although 
recently the greater impact of long- 
range cumulative effects of prior 
learning have been brought out 
strikingly by Underwood (1957) who 
utilized data from previous retention 
work and showed that more forget- 
ting is attributable to long-range PI 
effects than to RI. He found that, 
although well-practiced Ss forgot 
about 75% over 24 hours, naive Ss 
(no practice lists) forgot only about 
30%. This large differential in reten- 
tion could only be attributed to the 
strong PI effects of the practice ma- 
terial. Further experimental support 
was given by Seidel (1959), measur- 
ing concurrent PI and RI. 

The transitory nature of RI and 
PI is exemplified in the common ob- 
servation that these phenomena dis- 
sipate after a few relearning trials, 
sometimes even by the second trial 
(Osgood, 1948; Underwood, 1945). It 
follows that recall is the most sensi- 
tive measure, whereas if a relearning 
criterion is used, no interference 
effects may be demonstrable (Mc- 
Geoch & Underwood, 1943; Thune & 
Underwood, 1943; Underwood, 1949b; 
Waters, 1942). 

The rate at which RI dissipates is 
undoubtedly some function of the de- 

ree of learning, Or the degree of 
differentiation of the two response 
systems involved; but the form of the 
function is not completely known. 
Dissipation rate is of importance 
theoretically and empirically. Melton 
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d Irwin (1940) obtained fastest 
dissipation’ at the highest IL level 
used (40 trials), followed by the next 
highest level (20 trials). Thune and 


Briggs (1957) suggests that RI dis- 
sipates fastest 
material is w 
learned, only at low and intermediate 
OL levels. RI 
erally found t 
intermediate | 


as comparable 
figures for rates of PI would be wel- 
come, 


SIMILARITY oF MATERIALS 
Swenson’s (1941 
earlier work on si 
“Robinson’s theor 


ful) and partly because a more 


heuristic alternative has emerged. 
The trend within this period may be 
traced from Boring’s (1941) mathe- 
matical discussion of communality; 
Gibson's (1940) more analytical the- 
ory reflected in Hamilton’s De 
374) statement that “a two-varia : 
hypothesis should be accepted a 
preference to the Sagi Ro 
function”; through Haagen’s (19 9, 
P. 44) conclusion that “the hypotliesi 
applies, not to any dimension x 
similarity, but specifically to be 
condition in which the continuum o 
similarity involves a change in the 
SR relationship of the tasks”; to 
Osgood’s (1949) integration of the 
literature on RI and similarity in 
terms of his 3-dimensional transfer 
and retroaction surface. Ritchie 
(1954) argued that the Skaggs- 
Robinson paradox (the statemen 
that the point of maximal OL and I 

similarity is simultaneously the oA 
dition for greatest interference an 

also for greatest facilitation) is 4 
Pseudoproblem because of an am- 

iguous scoring procedure, In short; 
this hypothesis has been superseded 
by subsequent developments, tO 
which we now turn. Studies of the 
effects of similarity relationships 
have been Separated into those using 
Paired associates and those using 
serial lists. The use of paired associ- 
ates allows specification of the locus 
of the change in similarity eg pa 
the lists, an advantage which is no 
found with serial arrangements. 
Three classes of change between Pi 
items are possible: response GE 
A-C), stimulus (A-B, C-B), an 
both stimulus and response changes 
(A-B, C-D). 

The effect upon retention of tam 
ing a new response to an old stimulu! 
has been to produce RI (Bugelski, 
1942; Bugelski & Cadwallader, 195 ? 
Gladis & Braun, 1958; Haagen, 1943; 
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Highland, 1949; Osgood, 1946, 1948; 
fone 1955) and, also, retroactive 
& | (Haagen, 1943; Parducci 
eee , 1958). The variable that 
Was ini the direction of the effect 
Eet e degree of similarity between 
de oe responses. The problem of 
veloping’ a rigorously objective 
| ogee scale of meaningful sim- 
ag along dimensions feasible for 
ane in verbal form is a serious one, 
oct has not been adequately met. 
ing | y, adjectives scaled for vary- 
ipg levels of synonymity to standard 
bed were used. These levels were 
fo ae aa by judges 
Sa E a 1949; Osgood, 1946). Par- 
ee and Knopf (1948) used geo- 
ne tic figures varying along some 
2 sae dimension with four-digit 
a gee varying in identity as the 
Sea responses required. Their OL 
were visual discrimination 
en not really paired associ- 
cock he distinction is that the cor- 
S, response figure and numeral 
E saa on the stimulus card, 
(yao in the true paired associates, 
a ee is never a part of the 
ti ulus item. The theoretical ra- 
onale of Young’s (1955) study 
one some discussion. In the 
Fae A-C paradigm, learning A-B 
ofA. ee to the associative strengt 
niche through generalized reinforce- 
Calin The magnitude of such gen- 
alized reinforcement should be & 
TEN function of the degree of 
ae arity between the B and C re- 
h nse items. In the RI design it was 
hypothesized that the original list's 
ee strength (after the 
the oe learned) would be the sum © 
ing pes reinforcement gained dur- 
een s acquisition plus the additional 
a ralized reinforcement, gane 
T the subsequent IL learning. The 
aan on the other hand, would al- 
y have gained some generalize 
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reinforcement as a result of the OL 
training and would thus need less 
direct reinforcement to achieve crite- 
rion during its learning. This would 
leave, the original list with a greater 
associative strength at recall than 
the interpolated list, and the magni- 
tude of this difference would be de- 
termined by the degree of response 
similarity between lists. Therefore, 
it was predicted that, as response 
similarity between lists increased, RI 
would decrease and PI would in- 
crease. These predictions were tested 
by Young, using three lists of paired 
adjectives (to increase the effect) 
and three levels of response similar- 
ity. Results showed that RI as well 
as overt intrusions decreased as re- 
sponse similarity increased, as pre- 
dicted. The PI results, as well as a 
reinterpretation of this entire experi- 
ment, will be taken up at the end of 
this section. 

Osgood’s (1949) generalization that 
as response similarity decreases from 
identity to antagonism, retroactive 
facilitation should gradually change 
to increasing RI, was given some 
empirical support within this period. 
However, one disturbing finding has 
Bugelski and Cadwallader 


emerged. y 
(1956) made a comprehensive at- 
tempt to test Osgood’s generaliza- 


tions about similarity effects, part of 
which involved the use of Osgood’s 
own word lists to define four degrees 
se similarity—identical, sim- 


of respon 
ilar, neutral, and opposed—while 
keeping the stimuli the same. Results 


showed decreasing RI with decreas- 
ing response similarity. There was 
more RI with similar than with op- 
posed responses—a finding directly 
contrary to Osgood’s prediction, an 
not in accord with other data. No 
explanation was given for these Te- 
sults, but they cast doubt upon the 
previous formulation of response 
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similarity. In addition to Osgood’s 
disinclination to use RI control 
groups, he has also relied upon an 
uncommon measure of retention, 
namely, latency scores. In one of his 
studies (Osgood, 1948), the signifi- 
cant drop in RI between opposed and 
similar responses was evident only 
with latency scores, but traditional 
recall showed no significant differ- 
ences. In Osgood'’s other study (1946) 
there were no significant latency 
differences at recall, but only on the 
second and third relearning trials, 
At no time were the differences be- 
tween the neutral and opposed con- 
ditions significant. All things con- 
sidered, the evidence in favor of the 
retroaction surface is less than over- 
whelming as far as the right half of 
the response dimension goes, and 
indicates that a revision is needed. 
Saltz (1953) hypothesized that 
learning A-C after A-B inhibits B. 
Assuming that inhibition generalizes 
less than excitation, presenting a 
slightly altered A stimulus should 
again tend to evoke B. When tested 
in a straightforward manner, the 
hypothesis was not confirmed. A 
second attempt, designed to mini- 
mize changes in set, did result in a 
tendency toward reappearance of B, 
No further RI work along these lines 
has been reported. 
There have been two papers on the 
effects of response similarity on PI, 
One reported no differential effect 
(Young, 1955), although overt intru- 


sions Increased with response similar- 
ity, and the ot 


5 gical oversight 
with consequent possible confound- 


NORMAN J. SLAMECKA AND JOHN CERASO 


ing of the results of the Young (1955) 
and Morgan and Underwood (1950) 
studies should be pointed out. They 
both varied similarity along the 
synonymity of meaning dimension. 
In terms of A-B, A-C, the C response 
varied from very high (i.e., discreet- 
ailing, discreet-sickly), to very low 
similarity, or neutrality with regar 

to the B response (i.e. noiseless- 
sincere, noiseless-latent). Each single 
list had all of the responses at the 
same similarity level. Thus, it is con; 
ceivable that S could “catch on 

that the List 2 responses were similar 
in meaning to those of List 1, and 
thereby reduce his chances of making 
errors by restricting his responses tO 
members of the synonym category, 
with a resulting high positive transfer 
and low apparent PI. This postulated 
shift in the pool of responses available 
to S could be made entirely without 
his awareness, as several studies of 
verbal operant conditioning have 
demonstrated. With lists of low simi- 
larity on the other hand, the possibil- 
ity of such an occurrence would be 
nil, and therefore no response class 
restriction would be made, resulting 
in a drop in positive transfer and 
higher apparent PI, Since these stud- 
tes address themselves to rote learn- 
ing and retention, the possibility of 
such a form of concept formation is 4 
serious confounding variable. The 
test of retention may not be of rote 
recall at all, but actually of recon- 
struction of the response on the basis 
of the general concept of synonymity- 
As would clearly be predicted by such 
a “categorization” approach, the 
learning of List 2 was in fact fastest 
with high response similarity and be- 
came progressively slower with de- 
creasing similarity. Both studies 
stressed the previously discussed a 
sponse generalization rationale which 
would lead to increasing PI with in- 


ni 
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ona similarity, because learning 
Tite List 2 response would add 
i interfering strength of the List 

„ response through generalized or 
A reinforcement. These pre- 
lion Pe ie not in fact confirmed; 
Eos I tended to decrease with 
T similarity (although not 
tion ically significant), an expecta- 
on ponparen with the categoriza- 
the fo biome The magnitude of 
the ey is probably dependent upon 
ne =o strength of the two lists, 
ap as upon the number of alterna- 
a pret response classes, which 
to veriis, for further empirical work 
of bia y. Such an unintended source 
ii fh may also have been working 
(1956). Bugelski „and Cadwallader 
fae on study, which used a similar 
ably onstruction technique. Prefer- 
ed items at varying levels of re- 
WERE similarity should be included 
he n the same list, so that S would 
COnice no opportunity to grasp the 
ach pt of the overall list structure. 
Re a ers was used for RI by 
of thi (1946, 1948) who was aware 
and s ak ci A paper by Twedt 
showed nderwood (1959), _ which 
int shee. that there was no difference 
and ansfer effects between “mixed” 
lists Umea lists, is relevant to 
teristic ering only in formal charac- 
questio. but does not bear upon the 
of the {i of the general synonymity 
Of the Ea items as a whole. The lists 
Sas ae study were not varied in 
ity aadi Ss one response similar- 
@ a us do not constitutea test of 
ever sg asso hypothesis. How- 
and "Un, important paper by Barnes 
Medion raoi (1959) suggested a 
bility ion rationale as another poss! 
A-B’ Te A-B is the first list and 
of an A ae, there isa possibility 
recall, Į —B’ mediation occurring at 
+ In view of these complica- 


tion, 
s 
» we must conclude that the 


effects of varying response similarity 
still have not been unequivocally dem- 
onstrated or explained. 

The retention effect of learning 
the same response to a new stimulus 
was reported in four studies, all of 
which found retroactive facilitation 
(Bugelski & Cadwallader, 1956; 
Haagen, 1943; Hamilton, 1943; High- 
land, 1949). Similarity was varied 
either by using geometric figures 
differing in generalizability (origi- 
nally developed by Gibson, 1941) or 
meaningful words scaled for syn- 
onymity. The results agreed that 
retroactive facilitation increased with 
increasing stimulus similarity. The 
extreme of similarity is identity, and 
this produces the most facilitation of 
all since it amounts to continued 
practice on the original list. At levels 
of very low similarity there was some 
inhibition (Haagen, 1943), and ac- 
cording to Hamilton (1943, P. 375): 
“When the stimulus forms were of 
degree generalization there was very 
little difference in retention in condi- 
tions with responses identical and 
with responses different.” 

No study has ever tested the effects 
of opposed or antagonistic stimulus 
relationships while keeping responses 
the same. Osgood’s (1949) retroac- 
tion surface does not extend the di- 
mension of stimulus dissimilarity 
beyond “neutral” or unrelated, al- 
though the response dimension does 
include “antagonistic” relations. The 
ation is that stimulus opposi- 
tion is no different in its effects from 
stimulus neutrality, although no RI 
evidence is adduced for such a posi- 
tion. It is conceivable, however, that 
meaningful stimulus opposition oF 
antonymity would actually result in 
facilitation of recall, based upon @ 
mediation rationale, since such words 
would be related by ‘S's previous 
language experience. If response OP~ 


implic: 
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position is expected to differ in effect 
from response neutrality, then stimu- 
lus opposition might also. There are 
no corresponding paired-associates 
studies upon the PI effects of stimu- 
ariation. 
ce effect of changing both the 
stimulus and response members of 
the interfering list is concisely stated 
by Osgood (1949, p. 135): “negative 
transfer and retroactive inhibition 
are obtained, the magnitude of both 
increasing as the stimulus similarity 
increases.” One experiment did not 
vary stimulus similarity with unre- 
lated responses (Highland, 1949), 
four studies did vary stimulus simi- 
larity with unrelated responses (Gib- 
son, 1941; Haagen, 1943; McClelland 
& Heath, 1943; Postman, 1958), and 
another used three degrees of re- 
sponse similarity as well (Bugelski & 
Cadwallader, 1956). The five latter 
reports indicate increasing RI with 
increasing stimulus similarity, and 
the one study available shows that 
this holds over all levels of response 
similarity tested. Two studies from 
this group will be more fully de- 
scribed since they represent an in- 
triguing departure from the use of 
the usual physical or meaningful 
similarity dimension. McClelland and 
Heath (1943) used as stimulus items 
for the original and interpolated lists, 
respectively, a Kent-Rosanoff stimu- 
lus word and the most frequent free- 
association response made to it. Thus 
an existing prepotent connection was 
deliberately introduced. Responses 
were unrelated, and there was no 
control group. Recall was signifi- 
cantly less under that condition as 
compared with the case in which 
there was no association between the 
stimuli. Since the related words were 
not similar in appearance or in mean- 
ing (e.g., Thirsty-Water) and since a 
common mediating response could 
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not account for the directionality of 
the association, the authors con- 
cluded that: canal 
relation between origina 
BS Sata alos which determines the 
amount of RI, as similarity or as genera mi 
tion (plain or mediated) is too narrow a cor 
ceptionalization, since it does not cover ike 
a learned, uni-directional relation between 
two activities as was demonstrated to be © 
importance here (p. 429). h 
This study was not carried far enoug! 
to prove the point. A third group 4 
needed, for which the related OL an 
IL stimuli would be interchanged. 
this group would display no bera 
recall than the unrelated enmu 
group, then the case for the efec 
of unidirectionality of relationships 
upon RI would be established. Pos : 
man (1958) used geometric figures 4 
OL stimuli. The IL stimuli eee 
either the identical figures, wor 8 
describing the figures (i.e., ‘square i 
or color names. Responses were wa 
related. Both the figure and w 
groups showed significant RI, wit 
the former having the largest decre- 
ment, while the color group did not: 
These results were explained in terms 
of the previously learned connections 
between figures and their names, wit 
formal similarity producing greater 
interference than mediated equiva” 
lence. The influence of unidirection” 
ally prepotent and mediated cone 
tions upon forgetting deserves eVe 
more attention that it has yan one. 
PI is once again slighted, for ie 
are no paired-associates studies we 
cerning both stimulus and respo” 
changes. A 
We turn now to serial list studsa 
divided into those employing ® i 
crete, unconnected items, and thos? 
using connected discourse or some af” 
proximation thereto. Effects of s!™ a 
larity relations between discrete it P 
lists were reported in three paPe 3 
which were relatively unrelated as 1° 
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gards their major purposes. Irion 
(1946) varied the relative serial posi- 
tions of the original and interpolated 
adjectives, with some groups learning 
a identical words for IL, and others 
pos: SY ROMY INS: He concluded 
that similarity of serial position was 
an effective variable only when iden- 
tity of meaning was also present. 
Since several significant differences 
for IL were reported, we feel that the 
main variables were confounded with 
the uncontrolled degree of IL, render- 
ing the results ambiguous. Melton 
and von Lackum (1941), in a study 
designed to test an important deduc- 
tion from the two-factor theory, used 
two levels of similarity of interpo- 
lated items, and found both RI and 
I greater under the high similarity 
Condition. Kingsley (1946), with 
Meaningful words, also found poorer 
retention with interpolated syn- 
onyms as opposed to antonyms. Both 
of the above studies support the 
generalization that, with serial lists, 
XI increases with increasing stimulus 
Similarity, along dimensions of both 
identical elements and meaningful- 

Ness. 

Pieces prose or connected dis- 
ù urse has been, until recently, un- 
a elly resistant to demonstrable 

oae effects. Blankenship an 
a hitely (1941) studied PI of adver- 
ing material (a simulated grocer $ 
nal as a function of two levels 
are ned List 2 similarity. Recall 
Fd er 48 hours showed greater PI for 
ty opens similar condition. Their 
of dy actually did not vary degrees 
eo ey of prose, since one 0 the 
it © lists was nonsense material, and 
Baten t, be questioned whether 4 
the, s handbill resembles prose 
Hall ( than a list of paired associates. 
Corn 855) in an RI design, using 4 
ae E roe test, gave 30 sentences 
ı With IL being more sentences 
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varying in two levels of similarity of 
topic. Results of that, and of a sec- 
ond, unpublished study, both showed 
no RI. Deese and Hardman (1954) 
found no RI for connected discourse 
under conditions of unlimited re- 
sponse time. Ausubel, Robbins, and 
Blake (1957), using the method of 
whole presentation, found no RI. The 
measure of both learning and recall 
was a recognition test, largely of sub- 
stance retention. Peairs (1958) did 
find RI using a recognition procedure; 
Slamecka (1959), using grouped Ss, 
reported that unaided written recall 
of a short passage was a negative 
function of the degree of similarity 
of topic the interfering passage bore 
to the original passage. 

On the whole, these results were 
rather discouraging about generaliz- 
ing RI findings from nonsense mate- 
rial to connected discourse and led to 
the view that prose was not suscep- 
tible to RI, or at least to the similar- 
ity variable (Miller, 1951, p- 220). 
We feel, however, that the difficulty 
was not in the characteristics of con- 
nected discourse, but rather in the 
methods employed. It is noteworthy 
that all of the above studies employed 
the less well-controlled techniques of 
group testing, whole presentation, 
unlimited recall times, recognition 
tests, and the like. When, however, 
connected discourse was presented 
in the same manner as the traditional 
serial list, using the serial anticipa- 
tion method with individually tested 
Ss, significant RI was obtained, and 
it was clearly shown to be a function 
of degree of OL and IL, as well as of 
similarity of OL-IL subject matter 
(Slamecka, 1960a, 1960b). Any pre- 
sumption of the uniqueness of con- 
nected discourse with regard to these 
variables is no longer tenable, and 
the door is now open for further 
exploration of this area. 
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Errors in recognition and recall of 
a story were shown to be a function 
of the interference provided by the 
interpolated presentation of a picture 
which bore some thematic resem- 
blance to the story (Davis & Sinha, 
1950a, 1950b). Similarly, Belbin 
(1950) showed that an interpolated 
recall test concerning an incidentally 
present poster interfered with the 
subsequent recognition of the poster. 
If the attempted recall is viewed as 
interfering with the original percep- 
tual trace, then the degree of OL and 
IL (recall test) similarity was deter- 
mined by each S’s own recall per- 
formance. 

Lying somewhere between the use 
of discrete, unconnected items and 
ordinary prose are two studies em- 
ploying lists of various orders of ap- 
proximation to English, constructed 
according to a method developed by 
Miller and Selfridge (1950). If RI is a 
function of contextual Constraint, 
then the use of such materials should 
be appropriate. 

Heise (1956) used an unrelated 
word list as OL, and five different IL 
levels of approximation to English. 
He found recall was best with the 
greatest dissimilarity between the 
lists. Thus, the seventh order IL list 
(close to English text) produced al- 
most no interference, whereas the 
first order list (same order as OL) 
Produced a great deal, again support- 
ing the generalization concerning 
greater RI with greater similarity 
between serial lists. King and Cofer 


using OL lists at the 


and fifth orders, with four different 


orders of IL at each of the OL levels, 
Their intent was Bank ae 


ity effects at vari 


that the effects of contextual con- 
straint may prove to be more can 
plex than originally expected, an 
called for further investigation. 


EXTRINSIC FACTORS 


In this section are papers focusing 
upon variables actually extrinsic bs 
the specific items being learned. 
most of these studies the groups 
learned identical materials, and they 
differed only with regard to such 
things as the general surround, test- 
ing methods, and sets. d 

The striking effects of altere 
environment were shown by Bilodeau 
and Schlosberg (1951). The ae 
groups differed only in the om 
under which IL took place. or 
group stayed in the same room for a 
phases, and the other had the IL ma 
dissimilar room with a different ex- 
posure device and a changed posture 
for S, Recall, done in the OL room, 
indicated that IL interfered only 
half as much when associated with a 
different surround. Elaborating upon 
this, Greenspoon and Ranyard (1957) 
also used two different surrounds 
(different rooms, posture, and ba 
posure devices designated as A an 
B), in four combinations, and the re- 
sults, in terms of decreasing order 
of recall were ABA (AAA, ABB) 
AAB (those within parentheses nO 
significantly different). Although nO 
controls were used, the findings 
agree with those of Bilodeau an f 
Schlosberg. These studies suppor 
the view that, since recall sont 
place in some context, the ai 
governing a response lie not ay 
within the learning material, but a A 
in the general surround, and that a 
magnitude of RI is a partial eo 
of such context-carried cues. Lon 
relative importance of the prepne 
Ceptive vs. the exteroceptive cue 
Was not assessed. 


a 
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Jenkins and Postman (1949) varied 
testing procedures for OL and IL, us- 
ing anticipation (A) or recognition 
(R), in four combinations. Results 
showed a significant increase in recall 
igs procedures were different, un- 
a only one of the comparisons 
le A-R). The authors concluded 
is Hsing a different testing method 
i. change in set and “helps ia the 
i inctional isolation of materials 
earned successively” (p. 72). Post- 
man and Postman (1948) gave four 
tt the same materials, differing 
only in the order of the S-R items. 
i syllables-numbers for OL were 
asl bie by either paired numbers- 
The les or more syllables-numbers. 
nig changed set groups showed bet- 
ig er No control groups were 
ten . In the second part of the same 
dune’ OL was paired words with 
el a compatible (doctor-heal) or 
bet mpatible (war-peaceful) relation 
p een them. For IL, half the Ss 
ha as a list with the same logical 
the lons, and half learned one with 
lay ane relations to OL. This 
ton. group showed superior reten- 
Tags again attributed to the dissimi- 

sets involved. 
tal Paring the effects of inciden- 
and TL intentional learning of OL 
found ti Postman and Adams (1956) 
itio that, regardless of the OL con- 

ns, intentional IL produced more 
me incidental IL. Both inten- 
cial and incidental learning were 
pany susceptible to RI when fol- 
Sine by IL of the same kind and 
t oe as OL. The authors noted 
in th Intentional practice resulted 
item > learning of a longer number of 
Was S dutiny interpolation and hence 
feren more effective source of inter- 
that E (p. 328). Thus, it appears 
the y rae conditions were simply 
the ean by which degree of IL, 

effective variable, was manipu- 


lated. In an earlier paper, Prentice 
(1943) concluded that incidental 
learning was more subject to RI than 
intentional, but when Postman and 
Adams (1956) corrected Prentice’s 
data by subtracting the respective 
control group scores, the results 
agreed with the Postman and Adams 
findings. If incidental and intentional 
conditions are construed as providing 
different sets, or “functional isola- 
tion,” then an experiment in which 
the degree of acquisition was equal- 
ized should be expected to give differ- 
ent results: the similarly treated 
groups should display more RI than 
the changed-set groups. Since this 
has not been done, we must conclude 
that the RI effects of incidental vs. 
intentional conditions per se are not 
yet known. 

The effect of the emotion-arousing 
characteristic of the IL upon reten- 
tion is an interesting question, but 
only one study attempted it within 
this period and produced inconclu- 
sive results (McMullin, 1942), prob- 
ably because of a confounded experi- 
mental design. Among the truly 
inherent subject variables that have 
been investigated is the effect of the 
age of S (Gladis & Braun, 1958; 
Wywrocki, 1957). The former study 
divided Ss into three age classes: 
0-29, 40-49, and 60-72 years. There 
was no control group. Although a 
negative relationship between age 
and rate of learning was found, the 
adjusted absolute recall scores re- 
vealed no differential RI effects re- 
lated to age. One might speculate 
that the decreased learning ability of 
the older Ss was a PI effect resulting 
from their many years of previous 
learning. When the recall scores were 
“corrected” for this, the actually 
obtained negative relation between 
raw recall and age was eliminated. 
Among the more clinical subject 
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variables, Cassel (1957) reported no 
differential RI susceptibility between 
Ss of normal mentality and those 
with mental deficiency. Sherman 
(1957) found that Psychopaths 
showed better retention than either 
neurotics or normals, measured by 
total forgetting scores. Liyson and 
Krech (1955) reported a moderate 
positive correlation between recall 
and scores on the KAE (Kinesthetic 
Aftereffect Test, which was related to 
Krech’s cortical conductivity hy- 
pothesis). 

The importance of set factors, 
generally called warm-up effects, has 
been recognized (Irion, 1948). Thune 
(1958) showed that recall was signifi- 
cantly facilitated bya Preceding ap- 
propriate warm-up. If OL was from a 
memory drum and IL from a film- 
strip, then a memory drum Warm-up 
facilitated recall, but a filmstrip 
warm-up did not. 


more transitory, 
groups were used, 

he effects of such extrinsic vari- 
ables upon PI have not yet been in- 
vestigated, line of research 


much of our everyday forgetting is 


attributable to such context-associ- 
ated factors, 


[a]... interpolation immediate| j 
either to original learnj i eet 


retroactive inhibition than i 


is 
those two extre: 


mi - 15). 
[b] . . . the more recent studies Si a 3 


uggest an in- 


ime 
verse relationship between length of hetig 
interval and relative retroactive inhibi 
(p. 16). 


Subsequent work has called for a 
modification of those statements. 

Examination of the RI paradigm 
reveals three manipulable rempo a 
intervals: end of OL—start of I i 
end of IL—start of RL, and end o 
OL—start of RL. No single experi- 
ment, while keeping the IL earn 
Period constant, can vary only one o 
these intervals without automatically 
changing one of the others. When 
the IL learning period varies (as at 
Studies giving different numbers 0 
IL trials) while the OL-IL and the 
OL-RL intervals are kept constant, 
then the IL-RL interval will i 
evitably vary. Therefore, in the 
study of any one of these variables, 
confounding is inescapable. There 
is no easy way out of this dilemma. 
The only technique approaching 4 
Solution seems to be to do severa 
Separate experiments, confounding 2 
different pair of intervals each wisi 
and then evaluating the results of al 
the experiments by determining 
which confoundings have no effect: 

his more elaborate approach has 
not been used in actual practice; 
rather, acceptance of such confound- 
ing seems to be the rule, 

Varying the IL-RL interval allows 
or measurement of progressive 
changes in the strength of RI an 
PI, and deductions concerning the 
events that occur in that time. Une 
derwood (1948a), using IL-RL inter 
vals of 5 and 48 hr., and Briggs 
(1954) at 4 min. to 72 hr., report no 
significant changes in magnitude 0 

Deese and Marder (1957), using 
unlimited response times, from inter- 
vals of 4 min. to 48 hrs., and Pater, 
and Peterson (1957) from 0 to i 
min., both found no changes in recal!. 
Slight RI decreases were reported by 
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Jones (1953) from .17 to 24 hrs. 
(with an increase from 24 to 144 
hrs.) and by Ishihara (1951). Using 
the uncommon A-B, C-D design 
with very high levels of practice, 
Rothkopf (1957) found an increase 
In recall from 0 to 21 hrs., but no 
Control groups were used. From the 
trend of these results, the best con- 
clusion seems to be that RI remains 
relatively stable over time, at least 
up to 72 hrs. 

In examining the temporal course 
of PI, Underwood (1949b) found no 
change from 20 to 75 min., but (Un- 
derwood, 1948a) did find a drop in 
recall from 5 to 48 hrs. (no control 
groups), and Jones (1953) also re- 
Ported increasing PI. Ina study not 
explicitly designed to assess PI, 
therefore lacking control groups, 
Greenberg and Underwood (1950) 
also found a significant drop in List 
2 recall from 10 min. to 5 hrs. to 48 
hrs. In spite of the lack of appropri- 
ate controls in some of these studies, 
the results are in sufficient agreement 
to allow the conclusion that PI shows 
a gradual increase through time, 
which is in accord with logical ex- 
Pectations, as Underwood (1948a) 
has pointed out. 

In comparing the relative strengths 
of RI vs. PI through time under 
comparable conditions, Underwood 
(1948a) found that RI was greater at 

hrs., but that there was no differ- 
ence at 48 hrs. Jones (1953) and 
Rothkopf (1957) reported similar 
Observations. Underwood hypothe- 
Sized that the failure of List 1 recall 
to diminish might be due to a process 
of gradual recovery of OL responses 
after their unlearning during 

his led to the use of the modifie 
oe recall (MFR) procedure as 2 
= ethod of assessing response domi- 
oe In MFR, Sis given a stimulus 

em common to both lists and asked 


for the first response that comes to 
mind. It was felt that such unre- 
stricted, uncorrected recall would. 
provide a fairer estimate of the rela- 
tive strengths of the competing re- 
sponses, although it was clearly not 
intended to be equivalent to the re- 
stricted recall required for RI meas- 
ures. Underwood (1948b) gave MFR 
at 1 min., 5, 24, and 48 hrs. after IL 
and found no change in OL responses, 
a consistent drop in IL responses, 
and a rise in ‘‘other’’ responses. He 
concluded that: 

These data are given as further support of the 
interpretation of unlearning of the first list 
as being similar to experimental extinction. 
The fact that no decrease in the effective 
strength of the first list responses takes place 
over 48 hrs. suggests that a process running 
counter to the usual forgetting process is 
present. It is suggested that this mechanism 
may be likened to spontaneous recovery 


(p. 438). 


Concerning OL responses, it seems 
unnecessary to hypothesize two op- 
posing tendencies (recovery vs. ‘‘us- 
ual forgetting”) canceling each other 
out, as it were, to account for a find- 
ing of no change. The usual forget- 
ting curve might not necessarily be 
expected of OL responses, since the 
effects of IL could be such as to 
obliterate, through differential un- 
learning, more of the weak than the 
strong responses, leaving the strong, 
stable ones that are more resistant to 
the “usual forgetting” process, in 
the preponderance. List 2 responses, 
not so selectively eliminated, would 
be expected to decrease in time. In 
support of this alternate view we call 
attention to two relevant bits of evi- 
dence. Deese and Marder (1957) 
found that the number of items re- 
called after interpolation remained 
constant over intervals of 4 min., 2, 
24, and 48 hours after IL. Also, 
Runquist (1957) found that resist- 
ance to RI was positively related to 


466 


constructs as follows: The construct 
of generalization is “the tendency for 
a response R, learned to Sa to occur 
when S» (with which it has not been 
previously associated) is Presented 
(p. 204). The construct of differentia- 
tion is “a progressive decrease in 
generalization as a result of rein- 
forced practice with S,-R, and rein- 
forced presentation of S,” (p. 205). 
A curvilinear growth function of the 
generalization tendency as practice 
trials increase is stressed. Essentially, 
RI is related to the degree of dis- 
ciminability of the two lists, such 
discriminability being a positive func- 
tion of their respective degrees of 
learning, and a negative function of 
the time elapsed since learning. 
Spontaneous recovery of generaliza- 
tion tendencies (wrong responses) 
through time is assumed, From 


these Postulates, several deductions 
concerning RI w 


some of these ha: 
confirmed: i 


function of 


tions among the items (Gibson, 1941; 
amilton, 1943), and the curvilinear 
RI function obtained as the degree of 
IL increases (Melton & Irwin, 1940), 
Among the deductions tested but not 
confirmed is one 
temporal point of interpolation prob- 
lem. Gibson feels that one of the 
reasons for the disparity of results on 
this question lies in the neglect of the 
importance of the degree of acquisi- 
tion of the lists. She Predicted that 
acquisition level would be found to 


temporal point of 


ion was tested by 
Underwood (1951) using 
three levels of IL, acquisition (6/10, 
10/10, and 10/10+5 trials) and three 
OL-IL intervals (0, 24, and 48 hrs.), 
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but no interaction between them be 
found. RI control groups were ae 
used, and in light of the re 
importance of this study it wou 
seem advisable to re-examine these 
variables with a design adaptable to 
relative RI measures. The authors 
themselves expressed dissatisfaction 
with the outcome and “felt that a 
modification of the conditions in oF 
design would indicate the tempora 
Position to be a factor” (p. 289). ; 

Considering the general reaction 
toward Gibson’s theory in succeeding 
RI work, we feel that, on the whole, 
it has been favorably received, ee 
it has been given a certain amoun iy 
implicit corroboration by way of A 
ing compatible with many a 
(for instance, Briggs, 1957) and ap 
Potential for even further develop- 
ment. It has not, however, somu 
lated a comprehensive series of eX 
periments aimed at testing the man 
RI deductions implicit within it. T 
reason for this is certainly not any 
lack of clarity in the postulates. One 
present weakness seems to be the 
lack of direct evidence for a spon- 
taneous recovery process influencing 
RI. 


Melton and Irwin (1940) intro- 
duced their two-factor theory within 
the framework of a study of RI as : 
function of the degree of IL. OL wa 
5 trials on an 18-item serial none 
list, followed by 5, 10, 20, or 40 tria i 
on an IL list. Relying upon a panne 
of the overt interlist intrusions as 4 
objective index of the degree © 
competition between original an 
interpolated responses at recall, the 
found that the curves of amount h 
absolute RI, and the number of suc 
intrusions (multiplied by a factor 0 
2 to do justice to partial inate 
were not highly correlated. (T. A 
theoretical importance of intras 
counts gained its ascendancy wi 


oa 
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this study.) Rather, interlist intru- 
sions increased to a maximum at 
intermediate IL levels and then de- 
aie markedly, whereas the curve 
i tan rose sharply and maintained a 
elatively high level, declining slightly 
at the highest degree of IL. That 
portion of the RI attributable to 
irect competition of responses at 
recall was at a maximum when OL 
oe IL were about equal in strength. 
erefore, to account for the re- 
Eee of the obtained RI not ac- 
co for by overt competition, 
aig and Irwin postulated another 
äs E at work, tentatively identified 
_ the direct “unlearning” of the 
penal responses by their unrein- 
orced elicitation or punishment, 
during IL. The growth of this “Fac- 
tor X” was assumed to be a progres- 
sively increasing function © 
strength. Since Factor X was almost 
totally responsible for the absolute 
3 at the highest IL level, and since 
I under that condition dissipated 
most rapidly after a few relearning 
trials, it was concluded that the 
effects of such unlearning were quite 
transitory. This was still a competi- 
tion of response theory in the sense 
that the original responses were still 
assumed to be competing at recall 
with the interpolated ones, but to 
that was added the factor of weaken- 
ing in OL response strength, if not 
Complete extinction, through the 
Process of unlearning- 
i The presence of confounding be- 
ween the degree of IL, and the en 
of IL-start of RL interval was 
Pointed out by Peterson and Peterson 
(1957) as a possible alternative ac- 
Count of the differences in intrusions 
qPtained by the Melton and Irwin 
design, With a fixed OL-RL interval 
Bie IL-RL interval shortens, with 
creasing IL trials taking more time. 
owever, another study of the effects 


of degree of IL did use a fixed IL-RL 
interval (with a correspondingly 
varying OL-RL interval—Osgood, 
1948) i and still found comparable 
intrusion changes. 

A direct deduction from the two- 
factor theory is that RI, being a re- 
sult of both unlearning and competi- 
tion effects, should be greater than 
PI, which was presumed to be the 
result of response competition alone. 
This hypothesis was tested and con- 
firmed by Melton and von Lackum 
(1941) in a study using five trials on 
each of two 10-item consonant lists, 
and has also been given further gen- 
eral support by others (Jones, 1953; 
McGeoch & Underwood, 1943; Un- 
derwood, 1942, 1945). Underwood 
(1948a) in yet another study also 
found greater RI than PI at 5 hrs.; 
but at 24 hrs. they were equal. His 
resulting postulation of spontaneous 
recovery of the OL, and the subse- 
quent developments of that concept 
have been discussed above. 

Later, certain other observations 
led to some discontent with the two- 
factor theory. In an experiment de- 
signed to test the generalizability of 
the Melton and Irwin findings to 
paired-adjectives lists, Thune and 
Underwood (1943) used an A-B, 
A-C design with five OL trials and 0, 
5, 10, or 20 IL trials. Their results 
confirmed the existence of a nega- 
tively accelerated function between 
RI and degree of IL, as well as the 
fact that overt intrusions were maxi- 
mal at the intermediate IL levels 
(10 trials) and declined sharply by 
the 20-trial level, while RI still re- 
mained massive. However, there was 
no difference in the rate of RI dis- 
sipation between the 10 and 20 trial 
IL levels, and therefore the transi- 
toriness of RI at these levels could 
not reasonably be attributable to the 
unlearning construct. The two-factor 
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theory would have been forced 19 
predict faster dissipation at the 2 
IL level, since overt intrusions were 
far less for it than for the 10 IL level. 
In addition, the curve of Factor X 
drawn for the Thune and Underwood 
data was quite different in shape 
from that obtained by Melton and 
Irwin, and it was felt to imply rather 
incongruous psychological properties 
for a curve of unlearning. In addi- 
tion, an item analysis revealed that 
almost half of the overt intrusions 
took place on items where the original 
response had never been reinforced 
(or correctly anticipated) at all! 
Therefore, such interlist intrusions 
could not be legitimate indicators of 
response competition, since those 
responses had never been learned 
during OL, and were simply not 
available to be competing with any- 
thing. It is also to be expected that 
for original responses to be unlearned 
they would have to occur during IL 
in sufficient frequency to be subject 
to punishment or lack of reinforce- 
ment. Yet, as Osgood (1948) pointed 
out from his data, the number of 
related original list intrusions during 
IL was “infinitesimally small” and 
could not possibly account for much 
unlearning at all. This previously 
observed discrepancy between the 
assumed growth of Factor X and the 
lack of increase in intrusions during 
IL as a function of increasing IL 
trials should be tempered with the 
Possibility that partial intrusions 
could still play a large role in deter- 
mining the degree of unlearning ob- 
tained, and such intrusions are not 
easily detected and counted. 

Thune and Underwood (1943) sug- 
gested that the ratio of Overt to covert 
(and partial) errors need not neces- 
sarily remain constant, but may un- 
dergo progressive change as a func- 
tion of the degree of IL, therefore 
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accounting for the drop in overt 
intrusions by postulating an increase 
in implicit interference. In a subse 
quent paper Underwood (1945) ela 
orated upon this suggestion an 
formalized his differentiation theory. 
The shift in error ratios was inen 
preted as a resultant of two pous 
taneous processes: increasing IL as 
sociative strength tending to produce 
more overt intrusions, but beng 
gradually overcome by the growth 0. 
differentiation, tending to reduce the 
intrusions. The magnitude of the dif- 
ferentiation construct was held to He 
a positive function of the degree, > 
learning of both lists and a negativ 
function of the time between the en 
of IL and the start of RL. A decrease 
in overt intrusions was, in effect, the 
index of increasing differentiatie t 
When the two lists are about equal y 
well learned, intrusions are manna 
and differentiation is low; but with 
increasing disparity between their 
absolute or relative acquisition levels 
intrusions are reduced, indicating 
increased differentiation. By "i 
same token, a short IL-RL interva 
should also produce higher differ- 
entiation. That this is in fact the 
case was shown in the Archer ee 
Underwood study (1958) where over 
intrusions declined as the IL-* 7 
interval became shorter. The e 
creasing differentiation allows S 3 
recognize and withhold emonen s 
responses, resulting in fewer inter am 
intrusions and more covert or a 
sion errors. Differentiation Hatt 
scribed phenomenologically by 
derwood (1945, p. 25) as being 
related to the verbally reported experience, Be 
“knowing” on the part of the subject that are 
responses from the interpolated learning the 
inappropriate at the attempted recall o! eis 
L. Degree of differentiation in this a the 
thus an indication of the degree to whic 


A z * eac! 
subject identifies the list to which 
response belongs, 


$ 


pa 
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Papita] support for various aspects 
oe theory has come from several 
owes Archer & Underwood, 
a or 1948; Thune & Under- 
Te r 43; Underwood, 1945). Fur- 
de Ae fact that intrusion frequen- 
ea ange but RI still remains con- 
i Sa be simply a function of 
om imited recall time (usually 2 
hee available to S. If this recall 
a extended, then perhaps 
a „have sufficient time both to 
=n gnro the erroneous and verbalize 
ie rrect response, thus displaying 
E in RI at high IL (differ- 
cea T levels. Underwood (1950a, 
mi oi tested this promising hy- 
a ae but found no dropping o 
TEREE and Irwin effect, and 
on x ed that differentiation does 
(8 = ange as a function of increased 
he i recall time. Unlearning was 
on ore still retained as a usefu 
ei but, since it was shown that 
iat response weakening took place 
only in the first “few” IE tia 
a aa by associative inhibition, 
a ecause of the relatively great 
fee put upon the role of differentia” 
a Underwood's revision of the 
fant factor theory, became an impor- 
Por independent influence upon sub- 
eo RI thinking. 
Phi apparent similarities be- 
a Underwood's differentiation 
raha and Gibson’s concept ° 
ee deserve to be pointe 
dif at this time. For both theorists, 
f erentiation is in part a positive 
unction of degree of reinforced 
eon on the material, such prac 
ern serving to reduce overt intrusion 
a rs. Secondly, temporal relation- 
sat also play a large part in de- 
ining the strength of both 
Hon: tucts. However, the two post- 
ii s do differ with regard to certain 
th Portant aspects of operation, O 
ese determiners of differentiation- 


Underwood’s concept refers to the 
more global process of S correctly 
assigning the list membership of the 
responses, whereas Gibson speaks of 
discrete S-R connections in competi- 
tion. Furthermore, Underwood's 
theory is derived from experiments 
based largely upon the A-B, A-C 
design, whereas for Gibson, generali- 
zation as defined requires that the 
stimulus members be similar, but not 
identical. For Underwood, increas- 
ing differentiation is marked by a 
reduction of intrusions and an in- 
crease in omissions, but no drop in 
RI, whereas Gibson implies that in- 
creasing differentiation will result 
directly in improved performance. 
And finally, Gibson makes spon- 
taneous recovery an integral part of 
her differentiation concept, while it 
was not until later that Underwood 
ted a spontaneous recovery 
process, and that was reserved for 


the unlearning aspect of his theory. 
tical formulation to 


be considered was put forth by Os- 


good (1946). It stemmed from his 
investigations of the RI effects of 


involved ut 
rocal inhibition of antagonistic reac- 
tions, wherein “simultaneous with 


ponse the S is also 
make the directly 


antagonistic response” (Osgood, 
1948, p. 150). This was clearly an 
application of the reciprocal inhibi- 
tion concept of neurophysiology to 


the area of verbal behavior. In pur- 
suing the tenability of this position, 
two relevant transfer studies have 
shown that the learning of both simi- 
Jar and opposed List 2 responses was 
equally rapid, and much easier than 
learning neutral responses (Ishihara 
& Kasha, 1953; Ishihara, Morimoto, 
Kasha, & Kubo, 1957), thus failing to 
confirm the hypothesis. Unless fur- 
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ther support for the hypothesis is 
forthcoming, we must conclude that 
it will not become an important in- 
fluence in RI work. 

With regard to the question of the 
adequacy of the two-factor theories 
we are of the opinion that the con- 
cept of unlearning is a valuable one, 
but that an acceptable measure of its 
magnitude has not yet been devised. 
Interlist intrusions were proposed 
only as a partial index, but not as a 
complete measure of its effects, and 
the difficulties encountered by such 
an index have been enumerated 
above. Instructions calculated to 
encourage the verbalizing’ of errors 
do just that: Morrow (1954) and 
Bugelski (1948) found that “all that 
is required to obtain a large number 
of such errors is to ask for them” 
(p. 680). 

Two interesting Proposals have 
been advanced as methods for distin- 
guishing Operationally between ef- 
fects of competition and effects of un- 
learning. Postman and Kaplan 
(1947) spoke of two measures of RI: 
error scores, and the reaction times 
for correct responses (residual retro- 
action). These two measures were 
found not to be correlated and are 
thus of necessity measures of two 
different processes. They suggest 
that: “It is Possible that retention 
loss (error Scores) reflects the effects 
of unlearning, whereas reaction times 
may depend primaril 
tition between responses” (p. 143), 
Their experiment 


been tested. 

Later, Postman a 
proposed that the rate of recall of cor- 
rect responses be a Measure of un- 
learning. Retention Was measured 


nd Egan (1948) 
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by the free recall procedure, ang 
performance was recorded both ; 
terms of number of items recal eni 
as well as by the rate of emissioni 
correct items, per 3-sec. periods. 
They state that $ 
The two types of measures—amount lost ang 
rate of recall—may be regarded as me s 
of these two processes (unlearning and cpr pa 
tition, respectively). Those aspects obs 
which have been unlearned cannot be ev! ei 
on retest: unlearning leads to derem T 
amount retained. Other aspects suffer corte 
tition from the IL but are not onlea 
They are potentially available but ie 
turbed,” and manifest that in a slower ra 
recall (p. 543), 


These are both valid and constructive 
formulations deserving of forti 
attention, but no significant enon 
have as yet been made to test ta 
usefulness in predicting data crucially 
relevant to the unlearning factor. A 
These experiments by Postma 
point in a new direction, suggesting 
that such evidence for competitio!, 
of response is a result of the brie 
recall times used in RI studies. 
competition of response results 1” 
increased latencies, then decrements 
in recall may come when the latency 
of a response exceeds the 2-sec. inte! 
val usually used, Underwood's 
(1950a) study, which found no a 
with an 8-sec. recall interval, suP 
Ports this possibility. 59) 
An experiment by Ceraso (19 h 
may provide further support for rere 
a hypothesis. With an A-B, rie 
design, Ss were asked to recall bo 
the first and second list responses rae 
also to assign these to the proper /i 1! 
Since a 20-sec. (maximum) s 
interval was used, blocking due ot 
competition of response should erat 
be expected. An analysis of the fir a 
list responses which were correct ps 
the last trial of OL, and were the 
scored as incorrect at recall, choyo 
that the reason for the forgetting W 


SES 
y a 
m, 
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apiy the unavailability of the re- 
pee the response was available 
en , it was also assigned to the 
on ect list. Since competition of re- 
ee should reveal itself as a mis- 
oe of the response, it was 
aan that the forgetting obtained 
ene not be accounted for by com- 
e ion. Using a technique some 
4 at related, with an A-B, A-C 
e Barnes „and Underwood 
as ) obtained similar results, and 
cordingly rejected a competition 
explanation. 
eee also found that in a large 
ae er of cases S could give both re- 
ee “a to the stimulus. But does 
thet e unlearning hypothesis imply 
ae earning the second list response 
oe the unavailability of the first 
a ' pesponas? The answer that im- 
len, iately suggests itself is that un- 
F rning is a function of the degree of 
A and second list item learning. 
i erefore, an item analysis of the 
ind performed by Runquist was 
pceeealees The result showed that 
degree of learning of the second list 
item did not affect the retention O. 
me first list item, thus verifying 
unquist’s (1957) original finding. 
It seems that the latter data pose a 
real problem for current theories © 
RI, since the basic mechanism usu- 
ally postulated requires interaction 
between associations with similar or 
identical stimulus items- Both the 
Runquist and Ceraso findings seem 
to indicate a nonspecific mechanism. 
Learning a second list affects the 
entire first list, regardless of the 
Specific item interactions. 
In conclusion, it appe# 
major theoretical accounts O 
ree remained relatively unchal- 
enged and unchanged for 


an years, in spite of the accumula- 
ion of considerable empirical data. 
iew of the 


It is hoped that this overview 
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current state of the field will help to 
initiate a more vigorous and sus- 
tained effort toward an improved 
theory of forgetting. 


SUMMARY 


For a concluding statement we feel 
it would be appropriate to enumerate 
some of the pressing problems and 
empirical gaps currently evident in 
the status of our knowledge of RI 
and PI. These points are presented 
in the order of their appearance in 
the foregoing review and do not re- 
flect any opinion regarding their rela- 
tive importance. 

1 Reconsideration of the relative 
merits of RI quantification: absolute 
RI, relative RI, and total forgetting 

2, Determinants of the RI and PI 
of individual items 

3, Determinants of the rate of PI 
dissipation 

4. Development of an objectively 
quantitative scale of similarity for 
use in constructing lists of items 

5. Reappraisal of the right half of 
the response dimension of Osgood’s 


retroaction surface 


and PI, with responses the same 


7. Effects of varying response sim- 
I, with the “categori- 


jlarity upon PI, cat 
zation approach” error eliminated 
g. Further study of the RI effects 


of mediated and unidirectional pre- 
potent association between list items 

9, PI as a function of similarity 
relations within the A-B, C-D. de- 
sign 

10. Determinants of the RI and 
PI of connected discourse 

11. Relative importance of pro- 
prioceptive Vs. exteroceptive extrin- 
sic cues for recall 

12. RI effects of incidental vs. in- 
tentional acquisition conditions, wit 
degree of acquisition controlle 
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13. Effects of the affective char- 
acteristics of the material upon its 
d PI 
a Better handling of the problem 
of confounding which arises when 
temporal intervals are manipulated 
15. Further tests of the validity of 
the spontaneous recovery hypothesis 
16. Examination of the point of 
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interpolation problem as a function 
of other attendant variables 

17. RI as a function of presenta- 
tion rate, and of massing vs. distribut- 
ing trials 

18. Testing of the two-factor we 
ory through an improved measure 0 
the unlearning construct 
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in ChE, and the crosses tended to 
overlap both groups. 

While this second hypothesis was 
outlined in our initial publication 
(Krech, et al., 1954), it was spelled 
out more fully—in both its behavioral 
and its biochemical implications— 
in the 1955 Wisconsin Symposium 
(Rosenzweig, Krech, & Bennett, 
1958a). The assumption that animals 
with spatial preferences were gen- 
erally superior in adaptive behavior 
to animals with visual preferences 
was supported by the following ob- 
servations. We were using the un- 
solvable problem in the Krech Hy- 
pothesis Apparatus; that is, the rat 
was equally likely to be successful at 
a choice point whether he chose the 
light or dark alley or the right or left 
side. Under standard conditions of 


training, almost all animals neverthe- 
less show a light-goi 


ls (see 
1958a, Fig. 117). 


made in a dark- 


„Was interpreted to 
mean that the animals respond to the 


most dominant and insistent cue in 
the problem situation. An animal 
could therefore achieve a spatial 
preference only by ignoring the 
dominant cue, abandoning its initial 
hypothesis, and adopting the less 


obvious hypothesis of location in 
Space. The abilit 


ency in adaptive 
We therefore concluded 


nimal was more 


On the biochemical 


e side we made 
specific, in the Wisc 


onsin Sympo- 


sium, two points that had only been 
suggested in the initial preliminary 
report to Science: the major bio- 
chemical variable upon whose as- 


sumed function our hypothesis 
rested, was the transmitter si 
stance, acetylcholine (ACh); an 


ChE activity in our hypothesis was 
employed as an index to the avail- 
ability of ACh. 

Our argument in support of these 
points was as follows: (a) The trans- 
mitter substance, ACh, is importantly 
involved in neural transmission in 
the central nervous system. It is now 
generally acknowledged that when 
a neural impulse reaches the end of a 
Presynaptic neuron, a chemical trans- 
Mitter is released which diffuses 
across the synapse and excites the 
Post-synaptic neuron. (For a recent 
review of chemical transmission in 
the central nervous system, see 
Crossland, 1960.) ACh is the trans- 
mitter substance at many peripheral 
and central synapses. The released 
ACh, after exciting the post-synaptic 
neuron, is promptly hydrolyzed and 
inactivated by the enzyme ChE, and 
the synaptic junction is returned to 
its prior state. (b) Animals with 
greater availability of ACh coul 
therefore show readier transmission 
of nerve impulses, (c) Animals with 
relatively efficient transmission $y$- 
tems (higher rates of ACh function- 
ing) would tend to manage new prob- 
lems more effectively than animals 
with less efficient transmission SYS" 
tems (low ACh functioning rates). 
This latter assumption was tempere 
by the phrase “within limits” since 
it is possible that beyond some point, 
ready transmission of impulses coul 
result in explosive, undisciplined, an 
unintegrated behavior. (d) The ACh 
transmitter system includes two eni 
zymes, choline acetylase (ChA) which 
synthesizes ACh, and ChE which 
breaks down ACh. While our major 
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concern was with ACh, we decided 
to use one of the enzymes as an index 
rather than to measure ACh con- 
centration directly, for the following 
reasons: 


ACh is synthesized and destroyed contin- 
uously, and its concentration alters rapidly 
even with temporary changes in the functional 
state of the animal. Furthermore, when an 
animal is sacrified for assay, special precau- 
tions must be employed to stop at once all 
chemical activity in the brain, otherwise a 
large part of ACh will quickly be destroyed 
by the ChE present. These precautions in- 
volve dealing with frozen brain, from which 
it is difficult to take precisely defined cortical 
samples. Finally, no reliable chemical tech- 
nique is at present available for the measure- 
ment of ACh; only bio-assay methods can be 
used. Of the ChA-ACh-ChE system, ChE is 
a stable component for whose measurement 
we had available a relatively simple and 
extremely reliable chemical technique (Nei- 
lands & Cannon, 1955). This fact, together 
with some evidence which indicated that ChE 
was related to the activity levels of the other 
members of the system led to the decision to 
use ChE as an index to the availability of its 
substrate, ACh. However, we made this de- 
cision with some misgivings and noted that it 
would be important “to test the use of ChE 
as an index to ACh metabolism, since this 
is not founded upon direct observation” 
(Rosenzweig, et al., 1958a, pp. 397-398). 


Thus, „Our second experimental 
hypothesis—that a positive correla- 
tion exists between efficiency in 
adaptive behavior and the overall 
ChE level of the cortex—rested upon 
three major assumptions: animals 
which showed a spatial preference in 
the Krech Hypothesis Apparatus had 
a higher adaptive capacity than ani- 
mals which showed a visual prefer- 
ence, the availability of cortical ACh 
Was positively related to synaptic 
transmission efficiency, cortical ChE 
activity provided a good index to the 
Availability of ACh. 


Supporting Data 


Batre types of experimental evi- 
: appeared to offer support for 
cond experimental hypothesis 
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in our 1955 report: (a) Tests of addi- 
tional animals continued to show 
that those with spatial preferences 
had greater cortical ChE activity 
than those with visual preferences 
(Rosenzweig, et al., 1958a, pp. 385- 
388).8 (b) The intercorrelations 
among ChE measures in the three 
cortical areas (visual, somesthetic, 
and motor) were consistently posi- 
tive and of the order of .60 (Rosen- 
zweig, et al., 1958a, Table 30.) This 
supported the suggestion that the 
level of ChE activity of the rat's 
cortex is a general characteristic of 
its brain biochemistry. (c) Small 
doses of pentobarbital sodium were 
found to fixate the animals’ initial 
preferences and to prevent them from 
shifting to spatial hypotheses (Rosen- 
zweig, Krech, & Bennett, 1956, 
1958a, pp. 391-396). The use of this 
drug was suggested by the fact that 
in appropriate small concentrations 
it has been shown im vitro to reduce 
the rate of synthesis of ACh (Mc- 
Lennan & Elliott, 1951), and we 
therefore expected it to lower adap- 
tive capacity. The behavioral effect 
was in accordance with our predic- 
tion, and Moroz (1959) found that 
pentobarbital could also fixate ani- 
mals on spatial preferences. Never- 
theless, we do not now know how 
much weight should be given to this 
evidence, for the following reasons: 
we did not determine whether the 
rate of synthesis of ACh was actually 
reduced in the brains of our animals, 
and extrapolation from the in vitro 
to the in vivo condition is problemati- 
cal; pentobarbital has other effects in 


3 Pierce (1959) used a prol i 
of this test (the eae a 
removed) in which the behavior presumably 
afforded a measure of sensory preference; in 
this case the correlation of preference with 
ChE activity was significant and opposite in 


sign from what we had obtained in the prob- 
lem-solving situation. 
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the brain which also tend to reduce 
the brain’s responsiveness. 

In 1956 (Krech, et al.), a new test 
of the hypothesis was employed. In- 
stead of presenting the animal with 
an unsolvable problem—the stand- 
ard procedure for testing for hypoth- 
esis preference—a “progressively 
solvable” problem in the Krech Hy- 
pothesis Apparatus was employed. 
On the first day of testing the animals 
were presented with the usual un- 
solvable problem, during which they 
tended to adopt a visual hypothesis. 
The animals were then divided into 
two groups: for one group, the left 
alley was made progressively more 
often correct on succeeding days 
(the spatial problem); for the other 
group, the lighted alley was made 
progressively more often correct on 
succeeding days (the visual problem). 
The results of this experiment with a 
partially solvable problem were con- 
sistent with our prediction that high 
ChE activity favored relatively rapid 
shifts in the animal’s hypotheses 
when the situation demanded such 
shifts. This generalization held for 
both problems. 

The next step in the testing of our 
hypothesis was to determine whether 
relations exist between learning and 
ChE activity, using standard animal 
learning tasks. We were encouraged 
to take this step, having already 
found a relationship between hy- 
pothesis preferences and ChE in an 
unsolvable problem and between hy- 
pothesis shifts and ChE in a partially 
solvable problem, using the Krech 


Hypothesis Apparatus for both prob- 
lems. Furt ermo; 


follow-up the obse 
strain differences 
start of our rese 
strain showed hi 
than the S; strai 
had meanwhile 


arch—that the S, 
gher ChE activity 
n. This observation 
been repeatedly con- 


firmed by the chemical analysis of 
large numbers of S; and S; rats run in 
succeeding experiments. In thea 
two strains then, we had animals o 
known ChE characteristics. We 
therefore attempted to determine 
whether the animals of the Sı pig 
would be superior to those of the Wes 
strain in their performance on_ the 
‘various learning tasks. Indeed, if 
hypothesis had any validity, it wai 
necessary to predict such strain 
differences in behavior. . t 
Our first exploratory expeti 
(in 1956, unpublished) tested animals 
of these two strains on the Lashley 
III Maze. The results, pene 
based on rather small groups, ve 
encouraging. We then began a ee 
systematic examination of the lear : 
ing performance of these two TT Nal 
using three different learning ST tee 
the Lashley III Maze, the Heb il 
Williams Maze, and the Dashi 
Maze. A total of 77 S, and oe 
male rats was divided among th 
three tests. In each of these hows 
tests the Sı animals made signifi 
cantly fewer errors than the Ss ron 
mals. (See Table 1—the data for 13 A 
of these S, and S; rats were reporte 
at the 1958 APA meeting—Rosei 
zweig, Krech, Bennett, &% Longue!» 
1958c—and the data for all 157 a 
mals were presented at the 19: 
Pittsburgh Symposium—Rosenzwels» 
Krech, & Bennett, 1959.) het 
About this time we obtained ot 
biochemical measures which aS 
to offer further support for our ie 
pothesis. The observed Sgn 
between ChE and learning abi A 
was assumed, of course, to reflect GA 
fact that the ACh-ChE system is EN 
peculiar significance to cerebral fune 
tioning because of its contribution 5 
synaptic transmission. Howea 
since the ACh-ChE system is pe 
one of a number of biochemical sy 


Beh 
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TABLE 1 
STRAIN DIFFERENCES IN BEHAVIORAL TESTS 


t Correlations: 
Mean Errors Eirors vs. ChE Mean Errors 
Maze 
Signsof Com- 
Sı S: indiv. r’s bined r RCH RCL RDH RDL 
Hebb-Williams 
re 66*** 78 =d .03 90 86 90* 82 
(37) (89) (30) (20) (20) (26) (30) 
Dashiell . 
Score 240% 33 ++ wee 28*** 18 18 20 
(28) 27 (24) (25) (25) (30) (30) 
Lashley III 
Score 19*** 42 +++ -188 34** 24 24 24 
a4) a49 (33) (22) (22) (22) (22) 


da tery Most of these data were presented at the 1959 
r 93 additional animals of the Roderick strains are i 


and 
22 RDL) 


‘ne „which are important in brain 
Sage the possibility remained 
ty ChE activity merely reflected 
i e general enzyme and metabolic 
= in the rat brain. The observed 
behavior-ChE relations would not, 
in this case, depend upon the specific 
role of ChE in synaptic transmission. 
ise therefore started on a program to 
etermine what relationships exist 
etween measures of ChE and other 
Substances Our first experiment 
(Bennett, Krech, Rosenzweig, Karls- 
Son, Dye, & Ohlander, 1958a) in this 
Hiei compared ChE activity with 
lactic dehydrogenase (LDH) activity 
in the cerebral cortex and subcortex 
of S, and S rats, (The term ‘“‘sub- 
ke tex” was used to refer to the whole 
- rain after the dorsal cortex had been 
is moved.) LDH was chosen since it 
a an enzyme important in brain 
etabolism but apparently plays no 
pe role in the transmission proc- 
L The results, based on ChE and 
H analyses of 106 male animals 


Pittsburgh Symposium (Rosenzweig, et al., 1959), but 
neluded here (25 additional RCH, 25 RCL, 21 RDH, 


of several age groups, showed clear 
differences between distribution of 
activity of the two enzymes, for 
example: ChE activity showed a 
more sharply differentiated pattern 
of regional distribution within the 
cortex than did LDH; within the age 
range studied (30 to 150 days), the 
change in enzymic activity with in- 
creasing age was more marked for 
ChE than for LDH; within each 
strain separately, there was some 
indication of a modest positive cor- 
relation between cortical LDH and 
ChE—for the subcortex, there was 
no correlation; where the S, animals 
were, of course, higher in ChE ac- 
tivity than the Ss animals for both 
cortex and subcortex, no differences 
were found between the strains for 
LDH. 

The second study in this series 
(Bennett, Rosenzweig, Krech, Oh- 
lander, & Morimoto, in press) ex- 
amined the relationship between ChE 
and percent brain protein. In gen- 
eral we found the same results that 
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we had in our LDH study. „This 
time, however, we had available 
animals from four Roderick strains 
(to be described in the next section). 
as well as the S; and S; strains. The 
brains of 226 animals, ranging in age 
from 6 to 470 days, were chemically 
analyzed for percent protein and ChE 
activity. Some of the main differ- 
ences found for all strains between 
ChE and percent protein were the 
following: relative values for differ- 
ent regions of the cortex differed 
greatly for the two substances; ChE 
activity was found to be much more 
dependent upon age than was percent 
protein; ChE and percent protein 
showed no correlation with each 
other in cortex, and only a moderately 
low correlation in subcortex; pairs of 
strains differing markedly in ChE 
activity did not differ in percent pro- 
tein. 

The results of both these studies 
ruled out the interpretation that 
correlations were obtained between 
ChE activity and behavior only be- 
cause ChE activity simply reflects 
general biochemical characteristics 
of the brain, The lack of correlation 
between ChE activity and other 
measures of biochemical activity in 
the brain is consistent with our hy- 
pothesis that the observed relations 
between behavior and ChE activity 
are to be attributed to the specific 
role of ChE activity in synaptic 
transmission. 

At the 1955 Wisconsin Symposium 
the criticism was made that the 
differences in ChE activity that we 

had reported among our various 
experimental animals were so small as 
to be “close to or within the probable 
limits of, error of sampling and anal- 
ysis” (Tower, 1958, p. 356). It is 
true that we are dealing with small 
differences, but anyone working with 
brain biochemistry must be prepared 


to deal with small differences; we 
have repeatedly shown that the bio- 
chemical variability of the brain 1s 
small compared with that of other 
organs (Bennett, et al., 1958a; Ben- 
nett, Rosenzweig, Krech, Hades 
Dye, & Ohlander, 1958b; Bennett, 
et al., in press). This necessitate 
having reliable and valid measures 0 
brain ChE. Two kinds of evidence 
make it clear that the small differ- 
ences we find represent true individ- 
ual and strain differences that cannot 
be attributed to “error of sampling 
and analysis’: (a) In a paper en- 
titled “Individual, Strain and Age 
Differences in Cholinesterase ees 
of the Rat Brain” (Bennett, et aa 
1958b), we demonstrated, using gr 
from over 400 animals, that the © ; 
served differences between strain 

and among ages were consistent oral 
highly reliable. (b) A successit 

genetic selection experiment, to be 
described later in this paper, was 
based solely on measurements © 
cortical ChE activity; if the observe 

differences represented only errors: 
no effect of selection on ChE activity 
could have been achieved. 

Our original hypothesis was now 
bolstered by the observation of con- 
sistently positive correlations be 
tween hypothesis behavior and corti- 
cal ChE among individuals, and be 
tween learning performance E 
cortical ChE among strains, and by 
evidence that variations in Ch x 
activity did not merely reflect vate 
tions in general biochemical char 


ae s ce 
acteristics of the brain. But pie 
behavioral-biochemical corre aT 
were not completely satisfact© 


supports for our hypothesis. The 


hypothesis asserted that behave! 
differences are caused by, or are 
function of biochemical difference® 
Obviously it is hazardous to ae 
from correlations to existence of 


ee Oe es 
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causal relation. It was necessary, in 
other words, to devise an experiment 
which would answer the following 
question: were the observed correla- 
tions between our behavioral and 
biochemical variables necessary Or 
were they fortuitous? 
_ The traditional procedure in seek- 
the to answer such a question would 
e to manipulate experimentally the 
biochemical variable (the independ- 
z. variable) and observe whether 
earning performance (the dependent 
variable) changed in an appropriate 
manner. Thus, for example, one 
could alter the rate of ACh synthesis 
or the level of ChE activity by the 
use of drugs and see whether pre- 
dicted changes would occur in be- 
havior. We had used such an ap- 
proach when we employed pento- 
barbital sodium to inhibit ACh syn- 
thesis and predicted that the injected 
animals would show less adaptive 
Dobie ovina behavior than con- 
P| animals (Moroz, 1959; Rosen- 
weig, et al., 1956). The behavioral 
results had been consistent with our 
none about the role of ACh 
unctioning, but, as we have indi- 
cated, these results cannot be inter- 
preted unambiguously. In another 
attempt along this line in our labora- 
tory, McGaugh and Petrinovich 
K259) found that small doses of 
Rees improved the rate of 
co in rats. There is evidence 
2 at strychnine may inhibit ChE 
fe (Nachmansohn, 1938), but 
a9 drug, too, has other actions, so 
in, its effects here cannot be at- 
Su uted to the ACh-ChE system. 
uccessful use of a drug which is 
er. specific in its effect On 
ae! had been indicated by Rus- 
ie (1954) in a brief abstract en- 
on “Effects of Reduced Brain 
a Olinesterase on, Behavior.” _We 
SO tried using a specific ChE- 
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inhibitor, diisopropyl fluorophospho- 
nate (DFP), but we were finally led 
to abandon it and have not previ- 
ously reported on these attempts. 
Dosages of DFP large enough to 
produce substantial reductions in 
brain ChE had deleterious effects on 
gastrointestinal functioning. We 
therefore feared that the animals’ 
motivational state might not be 
comparable with that of the control 
animals, since we used food depriva- 
tion as the motivating condition in 
our behavioral testing. Furthermore, 
DFP was lethal for some members of 
each experimental group we employed 
(despite comparable dosages by body 
weight), and there was no way of 
knowing how this selective action of 
the drug would bias the results. The 
animals which did survive the drug 
and completed the tests showed no 
consistent differences in performance 
from their controls. 


Genetic Experiments 


Meanwhile we were preparing to 
use another approach to the prob- 
lem. Genetic selection experiments, 
it occurred to us, offered a possible 
way of developing strains with rela- 
tively high or low ChE activity. 
These strains could then be tested 
for behavioral differences. In 1955, 
Roderick, in our laboratory, under 
the guidance of E. R. Dempster of 
the Department of Genetics, began a 
selective breeding program designed 
to create strains of animals differing 
in cortical ChE activity. Roderick 
started his selective breeding from 
two heterogencous foundation stocks 
from the Genetics Laboratory—the 
Castle and the Dempster stocks. 
From each of these foundation stocks 
Roderick bred a high- and a low-ChE 
line. This program required three 
years (six selected generations) for 
its successful completion and at the 
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end we had available four strains of 
animals of known ChE activity; the 
RDH (Roderick-Dempster High 
ChE) and the RDL strains, the RCH 
and RCL strains (Roderick, in press). 

With these strains available to us 
we would be in a position to make a 
crucial test of the significance of our 
Previously observed correlations be- 
tween behavior and cortical ChE. If 
our hypothesis was valid, we should 
find the RDH strain was superior to 
the RDL strain, and the RCH strain 
superior to the RCL strain. On the 
other hand, if there were no neces- 
Sary relation between learning ability 
and ChE activity, then no consistent 
behavioral differences would be found 
between the RDH and RDL strains 
or between the RCH and RCL 


strains. 


straiib is appr . 


S; strains. -apared b : 
k y crossing the S, a 
strain, was begr), £ 1and 


then interbred raea ie the K 
generations. There w ef and was 
siderations which led to UY ` a several 
of the K strain. wo con- 


. q aeation 
1. If our hypothesis was wrong and i A 
merely fortuitous that the Si strain tas h 
superior in learning ability and hig! .. bag 
cerebral ChE activity than the S; strain, t a h 
the association between the behavioral an 
biochemical traits would be expected to dis- 
appear among the animals of the K strain. 
That is, correlations between error scores and 
ChE activity within the K strain would tend 
to approach zero if the genetic determinants 
of learning ability and brain ChE reassorted 
randomly. If, on the other hand, our hypoth- 
esis was valid and there was an intrinsic re- 
lation between cerebral ChE activity and 
learning ability, those animals of the K strain 
that had higher ChE activity would tend to 
show fewer errors in the learning tasks; that 
is, correlations between ChE and errors would 
be negative. 
2. The second consideration which sug- 
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gested the desirability of working with an 
Sı XS; cross derived from the same reasoning. 
Our previous tests involving correlational 
analysis were hampered by the restricted 
range of ChE activity within the Sı and Ss 
strains. It was thought that the K strain 
would be more heterogeneous than the Si or 
the S; strain, and that using the K strain 
would therefore increase the probability of 
obtaining a significant correlation, if an 5 
trinsic correlation between behavior and Ch 
did in fact exist. 


Before the K strain was ready for 
testing, some unexpected (and pre- 
monitory) results were obtained by 
McGaugh (1959) in our laboratories. 
He used descendants of crosses (the 
Sis and M: strains) established a 
number of years ago between the Si 
and the S; strains. When these ani- 
mals were tested on a 14-unit alley 
maze, McGaugh found a significan 
positive correlation between cortica 
ChE activity and errors ome 
r= .47, b<.05). On the Lashley I! 

aze, he found a correlation that 
Was not significant, but that was also 
Positive (n= 13, p=.28, p> .30). Mc- 
Gaugh found, in other words, some 
indication that with the crosse 
Strains the higher the ChE activity» 
the worse was the animal's learning 
performance. 

Our results with the K strain (ob- 
tained soon after McGaugh's work) 
are shown in the center columns © 

able 1. The animals were tested i” 
eight subgroups. In seven of the 


- eight, positive correlations betwee” 


‘ChE activity and errors were found. 
Thus we had replicated and extende 

‘the generality of McGaugh’s observa- 
tion to other learning tasks. For eac 

test, the correlations of the subgroup 
cvere combined, using Fisher's 7 tO 
$) transformation. The sizes of the 
dismbined correlations tend to D 
finall, and only that for the Dashiel 

Maze is significant. The consistent y 
positive direction of the correlatio”S 
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was puzzling. It will be remembered 
that we had been prepared for one of 
two findings: If our second hypothesis 
Was valid, we should have found nega- 
tive correlations between errors and 
ChE activity—the higher the ChE 
activity, the fewer the errors. If, on 
the other hand, our hypothesis was 
Wrong and no intrinsic relationship 
existed between ChE activity and 
learning ability, we should have 
found Jow and inconsistent correla- 
tions between the two variables. But 
our data, as well as those of Mc- 
Gaugh, did not follow either expecta- 
tion. Whatever the final interpreta- 
tion of these data would turn out fo 
be, they cast serious doubt on, he 
validity of our second hypot “ee 
Before considering the eg a 
Position further, however, let us a 
at the results obtained when th 
Roderick high- and low-ChE strains 
Were tested behaviorally. These e 
sults are shown in the right half o 
Table 1. Following our second hy- 
Pothesis, we had predicted that the 
high-ChE strain of each pair wou! 
e superior in learning ability to the 


low-ChE strain derived from the 
However, as 


‘able 1, in only one 


Of the six comparisons Wa® 
indication that this pre 
(RDH better than RDL on a 
Dashiell Maze). And this ec in- 
was negligible and statistica y z 
significant. On the other han sae 
four cases the low-ChE strain ma 
fewer errors than 
Strain, and three of t 
Were significant at the - 
better (using a two-tailed test). ae 
We were now faced witha para s 
For the S, and S; strains it appear 


that the higher the ChE activity, A 
better was the learning; for the 1% 


cross strains (Sis; Ma an P a ae 
the Roderick strains the hig 
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ChE activity, the worse was the 
learning. Clearly our second hy- 
pothesis, which demanded a positive 
correlation between ChE activity and 
learning ability, could not be main- 
tained. Just as clearly, it seemed to 
us, ChE was implicated in learning 
ability—but the nature of the rela- 
tionship between ChE and learning 
ability seemed to differ among strains! 


THIRD HYPOTHESIS 


Before all of these results were in, 
we, together with McGaugh, had be- 
gun to explore various revisions of 
our second hypothesis which would 
be able to encompass both the differ- 
ences between the Sı and S; strains 
and the results with the other strains. 
In considering the possible mecha- 
nism which might lie behind varying 
relations of ChE activity to learning 
ability, we were led to re-examine 
one of the two assumptions basic to 
our hypothesis—the assumed rela- 
tion between ChE and ACh. As we 
have already pointed out, both our 
first and secon hypotheses had 
rested upon the assumption that ChE 
activity afforded a good index to the 
level of ACh functioning. We had 
accepted this assumption provision- 
ally, pointing out that it should be 
subjected to 2 direct test. At then 
occurred to us that if we revised this 
basic assumption, & good part of our 

aradoxical data could be made com- 
Př ehensible- We therefore made the 
assumption that ACh and ChE were 
under relatively separate genetic con- 
trols, and therefore ChE activity 
would not be a good index to ACh 

tioning. Our other basic as- 
func™ on—that the ACh transmis- 
d therefore ChE, was 
involved in learning abil- 


ons t 7 
assump ird hypothesis reads as 
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follows: learning capacity is related to 
the levels of both ACh and ChE, such 
that, within limits, the greater the 
amount of ACh functioning at the 
synapse, the greater the efficiency of 
transmission and, consequently, the 
greater the learning ability. Greater 
ACh functioning can be achieved in 
either of two ways (or by a combina- 
tion of the two): the more ACh avail- 
able and released, the greater is the 
functioning; with a particular amount 
of ACh released, the lower the activ- 
ity of ChE in breaking down the ACh, 
the greater is the functioning. This 
formulation is a revision of several 
preliminary statements previously 
made (Krech, Rosenzweig, & Ben- 
nett, 1958; McGaugh, 1959: Rosen- 
zweig, et al., 1958c). 

This third hypothesis, we believe, 
can handle the data obtained with 
the Tryon strains (Sı and S;) and 
with the Roderick and crossed strains, 

For the Tryon strains the reason- 
ing is as follows: Tryon’s animals 
that were bred to make few errors 
in solving a maze would be expected 
to have nearly optimal ACh function- 
ing. Presumably this would mean 
relatively high levels of both ACh 
and ChE, making for sure and rapid 
synaptic transmission. His animals 
bred to make many errors would be 
expected to have unfavorable ACh 
functioning, Presumably this would 
mean lower levels of both ACh and 


E, making for less certain and 
slower transmission, 


In the ca 
four Roderick strain 


has not been reported; that is, there 
appears to be no evidence that alter- 
ing ChE activity will affect the rate 
of synthesis of ACh. Selection for 
ChE activity could mean an indirect 
selection for ACh turnover (despite 
genetic independence of ChE and 
ACh), since an individual with great- 
er release of ACh would thereby 
have a compensatory increase 1n 
synthesis of ChE. Nevertheless, 
since there are other factors control- 
ling the level of ChE activity, genetic 
selection for ChE should affect ChE 
more directly and more strongly than 
it affects ACh. Thus, the high- and 
low-ChE strains of Roderick might 
be expected to show little or no dif- 
ference in ACh. The high-ChE 
strains might then suffer from too 
rapid breakdown of ACh at the 
synapse, making transmission less 
certain and learning less effective. In 
the low-ChE strains, released AC 
might be allowed to work over a 
slightly longer period before being 
broken down, thus rendering synaptic 
transmission more certain and learn- 
ing more effective, 

For the Sis, the Me, and the K 
strains we can reason in a similar 
manner. If ACh metabolism and 
ChE activity are genetically inde- 
pendent, there would be random se 
assortment of the genetic determi- 
nants for ACh and ChE levels when 
the Tryon strains are crossed an 
interbred for several generations. 
This would mean that any group ° 
descendants of the S;XS3 grosso 
which showed a high ChE activity 
would be likely to include animals 
of high, medium, and low ACh con- 
centration. In other words, on the 
average, the kigh-ChE animals of the 
crossed strain would have a medium 
concentration of ACh. In the sam 
way, on the average, Jow-ChE anima é 
of the crossed strain would also hav 
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TABLE 2 
CETYLCHOLINE NCENT! x OLINESTERASE ACTIVITY FOR TRAIN: 
NE CONCENTRATIONS AND CHOLINESTE) Ef ITY Srx STRAINS 
A C A 3 N 


Strain N PT aen ea, N pecan Pr ere 
S 

S: no Hie] Do iu “187 
RCL i mo it ee “169 
RDE 168.0 S ape | lias 


gee concentration of ACh. 
Bee this it follows that, on the 
the age, for the high-ChE animals, 
Ke pane and released ACh would 
ae ‘00 low for optimal ACh function- 
Te guile for the low-ChE animals 
A eed ACh could function over 
We ae, long period of time. Thus 
age ne expect to find, on the aver- 
he ut with numerous exceptions) 
sho the high-ChE crosses would 
si me less efficient synaptic transmis- 
ae than the low-ChE and therefore 
Saag ag learning. This would 
twe t in low positive correlations be- 
Sa en ChE activity and errors in 
rning tasks within animals of the 
1XS; strains. 
R this third hypothesis could 
co all of our data 1s not too 
eres since it was deliberately 
ilored to fit data already available. 
ei this hypothesis did not 
ea to remain a post-hoc rationaliza- 
est. It could be put to independent 
ions since it generated new predic- 
oo strain differences in brain 
Vati emistry about which no obser- 
ons existed. Specifically, this 
Ypothesis demanded that the Sı 


tandard conditions. In other wo. 
rain, if this were done under our assay conditions, 


ana AS /gm. 
E activity is expressed in moles ACh x10! hydrolyzed /min/mg. 
ion for a strain by its ChE activity, obtained under 


resses the time required for th 
rds, it would take about ae tags 


strain have greater ACh than the S; 
strain, while smaller differences in 
ACh (or none at all) should exist 
within the RDH-RDL pair and the 
RCH-RCL pair. We ventured this 
prediction soon after we had arrived 
at our third hypothesis but before we 
had the means for its testing (Krech, 
et al., 1958; Rosenzweig, et al; 
1958c). The collaboration of James 
Crossland, of St. Andrews University, 
as USPHS consultant in our labora- 
tories during the summer of 1959, 
allowed us to carry out the ACh as- 
says required for the testing of these 


predictions. 

To preserve the highly labile ACh, it was 
necessary to work with frozen tissue, and this 
made it extremely difficult to obtain separate 
cortical and subcortical samples. The ACh 
assays were therefore carried out on tissue 
samples which included all of the brain an- 
terior to the pons and cerebellum (the pons, 
cerebellum, and medulla being excluded). The 
analytical techniques are summarized in 
Bennett, Crossland, Krech, and Rosenzweig 
(1960). ChE was then analyzed on brain sam- 
ples comparable to those employed for the 


ACh assays. 


The results of these analyses are 
presented in Table 2. Two major 
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points can be seen from the ACh data 
in the third column of the table: (a) 
The S, strain has a significantly 
greater concentration of ACh than 
does the S; strain, as we had pre- 
dicted. The difference is about 14% 
and is significant at the .001 level of 
confidence. These data appear to be 
the first evidence for strain differences 
in ACh concentrations. (6) Neither 
of the pairs of Roderick strains shows 
a difference this large, again follow- 
ing the prediction, The RCH rats 
show a concentration of ACh about 
9% greater than that of the RCL 
rats, and this difference is significant 
at the .05 level of confidence. The 
difference between the RDH and 
RDL strains is less than 1% and does 
not approach significance, 


he data for ChE activit - 
sented here for th A 


comparable to tho 


value by 23%, and the RDH value 
exceeds that of the RDL by 19%. 
These results appear to confirm 


tivity. Co 
RDL strair 


A concentrations 


under independent genetic control. 
In any event, it is clear that, 
among strains, ChE activity does not 
afford a valid index of the concentra- 
tion of ACh for the brain, as sampled 
in our assay. 

Since the ACh and ChE measures 
may be largely independent among 
strains, it is necessary to determine 
both in order to characterize the 
functioning of the biochemical sys- 
tem of which they are components. 
(Information about the third mem- 
ber of the system, ChA, may be of 
further value, and we are now at- 
tempting to obtain it.) The Si strain 
exceeds the S; strain significantly a 
both ACh concentration and ne 
activity. As we have pointed a 
earlier, this combination should ma = 
for more rapid and more certai 
synaptic transmission in the S; strain. 
This more efficient transmission 
should, in turn, make for superior 
learning capacity. In the case of the 
paired Roderick strains, the greater 
ChE activity of the high-ChE ee 
is not accompanied by an equally 
great relative increase jn ACh con- 
centration. This, we have suggested: 
should bring about more rapid de- 
struction and less persistent action 
of released ACh at the synapse d 
the Roderick high-ChE strains, an 
consequently should result in less 
certain synaptic conduction. 

It will be seen that we have al 
stricted our comparisons to strains 
that are derived from a common 
foundation stock (Si vs. S RCH v5: 
RCL, RDH vs. RDL). This is done 
because we consider ACh functioning 
to be only one of many factors whic 3 
determine learning capacity. Withir 
paired strains, the other factors 
determining learning capacity are 
likely to be in common, only the s¢ 
lected Physiological characteristic an 
factors closely related to it differing 
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The comparison between paired 
Strains, while still somewhat risky, 
reduces the ambiguity of conclu- 
Sions. If strains deriving from differ- 
ent foundation stocks were compared, 
the attempt to relate behavioral 
differences to any one of the large 
variety of physiological differences 
existing between the strains would 
be perilous. 

The last column of Table 2 presents 
Tatios of the ACh and ChE values for 
posh strain. While some of the dif- 
erences within pairs of strains are 
a small, it is interesting to note 
ag within each pair, the strain with 
E greater ratio is the one which we 

ve previously seen (Table 1) to be 
erior in learning behavior. These 

ifferences in time required for the 
transmitter substance to be hydro- 
an by its enzyme thus accord with 
ES behavioral data. That is, for the 
ž ain that learns better, the ACh 
Ontinues to function for a somewhat 
iin time at the synapse before 
eing hydrolyzed. Nevertheless, we 
© not wish to put much emphasis 
Upon the ratio, for two reasons: (a) 
es ratio does not preserve any 
oe ave of the absolute levels of 
and ChE, and these levels are 
oe of considerable importance. 
ae strains (or individuals) had 
er ratios, but one was high in 
ny ACh and ChE while the other 
as low in both, we would expect to 
nd the first superior to the second in 
Problem-solving ability. This ap- 
oe to be essentially the case with 
are Sı and S; strains, whose ratios 
ini Kaa similar. (b) Use of a ratio 
co t imply that all of the A 
d be acted upon by all of the 
cüm Actually we know that both 

Meco show distinct patterns © 
regi ibution among discrete brain 

ons, and there is also reason to 

eve that different regions are O 
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decidedly different importance for 
learning. Thus these ratios can pro- 
vide only gross estimates of the rela- 
tive length of time required for de- 
struction of the transmitter sub- 
stance at synapses involved in learn- 


ing. 
CURRENT AND PROJECTED STUDIES 


The third hypothesis, like its 
predecessors, is only provisional and 
is at present being subjected to fur- 
ther tests. In this section we will 
indicate further chemical and physio- 
logical tests and genetic selection 
studies, in progress and projected. 

The chemical tests are concerned 
with the validity of our conclusion 
that ACh concentration and ChE 
activity are largely independently 
controlled. The large brain sample 
which we used in assaying for ACh 
may have obscured positive relations 
between ACh and ChE activity be- 
tween strains for defined subregions of 
the brain that are important in learn- 
ing. If such positive relations exist, 
they would provide evidence against 
our hypothesis. This makes it de- 
sirable to obtain values for ACh and 
ChE in subregions of the brain. We 
are now doing ChE analyses on such 
samples. Since, as we have indicated, 
we cannot take precisely defined 
sections when the brain has been 
frozen for ACh assay, we are attempt- 
ing to determine, in collaboration 
with Crossland, whether ChA activ- 
ity can be used as an index to ACh 
concentration. Since analysis of Ch/ 
activity does not require a frozen 
brain, the assay for ChA can be done 
on precise brain samples. 

A physiological test frequently 
used to provide a measure of cortical 
excitability consists of delivering 
electrical shocks through the head 
and determining the threshold for 
seizures. With this minimal thresh- 


a NNETT 
490 MARK R. ROSENZWEIG, DAVID KRECH, AND EDWARD L. BEN 


old seizure test, the S, strain has 
been found to have a lower threshold 
and therefore greater cortical excita- 
bility than the S; strain (Woolley, 
Rosenzweig, Krech, Bennett, & Ti- 
miras, 1960). This is, of course, 
consistent with our chemical findings 
about the differences between these 
two strains, and we plan to subject 
the Roderick strains to this electro- 
physiological test. We should expect 
the RCH to have a somewhat higher 
threshold than the RCL strain, and 
the RDH somewhat higher than the 
RDL. 

A genetic experiment is in progress 
to test our third hypothesis. This 
experiment is essentially patterned 
after Tryon’s experiment but with 
concurrent analyses of brain bio- 
chemistry. It is being conducted by 
Richard Olson, a graduate student in 
the genetics department. From each 
of two heterogeneous foundation 
stocks, one line js being developed 
for superior learning and one for in- 
ferior learning in the Lashley III 
Maze. In each generation, the braing 
of half of the animals are being as- 
sayed for ACh concentration and the 
other half are being analyzed for ChE 
activity. The chemical Measurements 
will not enter into the selection pro- 
gram which is based entirely upon the 
behavioral scores. The aim of the 
experiment is to determine whether 
Progressive selection for learning 
capacity will entail concurrent changes 


in ACh and ChE similar to the dif- 
ferences found b 


maze-dull strains, 
Another genetic ex 
we hope to undert 


and decreases of ACh metabolism 
lead to similar changes in Ce 
tivity—Krech, et al., 1960). a 
genetic selection for high on 3 
centrations alone should yie 4 
strain that is high in both ACh a 
ChE, the change in ChE beint i 
instance of induced (nongene fe 
enzymic response. Similar as 
selection for low ACh eee 
should yield a strain low in both 2 
and ChE. We would predict that na 
former strain will be superior to t 
latter strain on learning tests. 


CONCLUSION 


In the attempt to test our ga 
hypothesis—that variation in nite 
chemistry is a major determina cu 
variation in adaptive behavior am p 
normal individuals—we have a 
jected two specific hypotheses dng 
are now testing a third one. Tes ee. 
each hypothesis has produced, a 
observations about brain chemi ae 
about behavior and about telang 
between brain chemistry and 
havior, 

Some of the main findings of one 
laboratories are these: (a) Sia à 
and individuals may differ er 
cantly in brain ACh concentratht 
and ChE activity, (b) Rats rie 
bred selectively for brain ChE ac an 
ity. (e) Selective breeding for A 
enzyme (ChE) may have little 5 ia 
effect on the concentration © d 
substrate (ACh). (d) Measan rA 

rain chemistry (ACh concentrat silt 
ChE activity, LDH activity, perai 
protein) show relatively little v We 
tion among individuals, compare ast 
the variability found in other orga o 
(e) The pattern of distribute Ei 
ChE activity among brain reg! = 
differs from that of LDH acuan a 
percent protein. (f) ChE aeri Os 
general characteristic of the hig 
since values for one region show 
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correlations with those for other 
regions. (g) Pentobarbital sodium 
causes rats to fixate on one attempted 
solution of a problem rather than to 
try other possible solutions.  (/) 
Tryon's genetic selection program 
has been shown to have had rather 
general results, since the S, strain is 
superior to the S strain on three 
behavioral tests, none of which was 
employed in the original selection. 
(i) Strains bred for high cortical ChE 
activity tend to be inferior on be- 
havioral tests to strains from the 
same foundation stocks bred for low 
ChE activity. (j) In strains descend- 
ed from crosses between the Sı and 
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S; strains, error scores and cortical 
ChE tend to be correlated positively. 

The relations between variation in 
brain chemistry and variation in 
problem-solving ability for three 
paired strains (Sı vs. Ss RCH vs. 
RCL, RDH vs. RDL) and for a 
crossed strain (K) can all be encom- 
passed by our current hypothesis. 
As tests of this hypothesis lead to 
discovery of more facts about brain 
chemistry, behavior, and their rela- 
tions, undoubtedly further progress 
in theory will be needed to summarize 
and to stimulate further progress in 
experimentation. 
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Considering all the evidence avail- 
able now, the writers themselves 
cannot doubt that an extremely high 
degree of association exists between 
cigarette smoking and lung cancer. 
This association has been reported 
independently by many different 
investigators who cannot all possibly 
have committed the very same errors. 
The evidence appears to support 
strongly the hypothesis that cigarette 
smoking is a major causative factor 
in lung cancer. The present status of 
the cause-effect controversy is de- 
scribed most cogently in the articles 
by Backett (1958), Hammond (1958), 
Little (1957), and Rutstein (1957), 
the review by the Study Group on 
Smoking and Health (1957), the 
book by Northrup (1957), and the 
Public statement by Leroy E. Bur- 
hey. As Rutstein (1957) has pointed 
out, the general health problem posed 

y excessive cigarette smoking is of 

Sufficient magnitude to warrant con- 
sideration now of what preventive 
Measures may be socially possible 
and desirable. Although data now 
Clearly indicate that any reduction in 
the number of cigarettes smoked by 
an individual reduces the health 
hazard (Hammond, 1958, Figure 6, 
P. 338), the percentage of smokers 
Who have given up smoking is quite 
Ow, falling somewhere between 10 
Ma 18.9% (Haenszel, Shimkin, & 

iller, 1956, p. 24; Hammond & 

ercy, 1958, p. 2956). 
In view of the current widespread 


. 

nieergeon General, Public Health Service, 

tion ed States Department of Health, Educa- 
» and Welfare, July 12, 1957. 


interest in the smoker, it has seemed 
to us wise to review the literature on 
what is known about the psycho- 
logical, personal, social, and situa- 
tional characteristics of smokers and 
nonsmokers. Few systematic studies 
exist which are concerned directly 
with this issue. Some of the pertinent 
findings are summarized below. Two 
of the best studies apparently have 
not yet received wide attention. 

The first of these was a study by 
Haenszel et al. (1956). As a supple- 
ment to the United States Bureau of 
the Census Current Population Sur- 
vey (CPS) for February 1955, smok- 
ing histories were collected from ap- 
proximately 40,000 men and women 
18 years of age and over. Survey 
data were adjusted to include unsur- 
veyed population groups (Armed 
Forces, teenagers under 18, and 
institutional population). 

It was possible with apparently a 
fair degree of accuracy for the au- 
thors to generalize from this repre- 
sentative sample of 40,000 subjects 
to the smoking habits of the total 
adult population of the United States 
(some 105 million adults). Adjusted 
estimates largely based on data from 
this 1955 survey, the 1952 survey of 
some 187,000 older men made by 
Hammond and Horn for the Ameri- 
can Cancer Society (1954, 1958), and 
a variety of other statistics-gathering 
agencies (Haenszel, et al., 1956) 
indicate that the total number of smokers in 
the United States and in overseas forces as of 
early 1955 was close to 60 million. Of these 
54 million smoked regularly (everyday) and 
the rest occasionally (not everyday). The 
number of regular and occasional smokers in 
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TABLE 1 
AGE AND SMOKING? 
(Regular Cigarette Smokers in the United States by Age and Sex) 


Males Females Both Sexes z 
7 % 
“5 Total N Smokers % Total N Smokers % Total N Smokers = 

10 
12,865,000 4,833,000 38 

= 87,000 46 | 7,460,000 2,346,000 31 1865, 235,000 
2544 21 aao oo 12475000 57 | 23,680,000 8933.000 38 45,500,000 21 E 32 
45-64 16,034,000 7.250000 34 16,695,000 3,295,000 20 32,729,000 Ota a 
65 and over | 6,322,000 11296’000 20 7,261,000 79,000 5 1583, eee 

Total 49,581,000 23,411,000 47 | 55,096,000 14,953,000 27 | 104,677,000 38,364, 


^ Abstracted from Table 2, p. 57, Haenszel, Shimkin, and Miller (1956). 


the civilian, noninstitutional population, 18 
was about 51} million— 


who smoked occasionally. The number of 
female smokers totaled 14 million regular and 
2 million occasional (pp. 11-15). 

The numbers shown in the tables 
presented below vary slightly from 
these, due to the assumptions made, 
especially regarding the unsurveyed 


(institutional, etc.) groups (Sackrin & 
Conover, 1957, p. 1). 


aenszel et al. had smoking histories 
and on whom 
Sackrin and Conover to obtain addi- 
tional information on income. 


aA representative 


s (18 years of 
age and over) in Buffalo 


selected characteristics. 


s From this 
903 cigarette sm 


okers were 


matched with 903 nonsmokers vn 
respect to age, sex, race, and pre: 
status. The chief findings of t of 
three major studies, as well aee 
numerous others, will now be sum 3 
rized under their appropriate heading: 


LES 
PERSONAL-SITUATIONAL VARIAB 


Age and cigarette smoking. The a 
in our Table 1 have been abeam a 
from Haenszel et al. (1956). of 
very few males below the age ee 
and probably no females in gre, 
young age group smoke. For A sin 
cans, smoking apparently begin nS 
the early and late teens. Ti 
data are given for the ages have 
years. While smoking patterns t5 
changed markedly in the pas y 
years, and are continuing to charra 
in the year 1955 age (beyond aa 
lescence) had a curvilinear relat 
ship to smoking. Table 1 shows the 
for the two sexes combined, in a 
younger age ranges (18-24), pe ie 
mately one out of three (38%) i ses 
viduals were smokers; this A 
to one out of two (47%) indivi 25- 
in the early middle age group the 
44); and then declines again tori 
older group (65 and over) to EE 
one out of every eight (12%). A are 
tional data on age and smoking ee 
to be found in other sources (am 
mond & Horn, 1954, 1958; ee i 
1958; McArthur, Waldron, & Dic 57). 
son, 1958; Sackrin & Conover, 19 
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TABLE 2 
SEX AND SMOKING* 
(Smokers in the United States Aged 18 and Over) 
Males Females 
% of f 
Bi Total N es 

Total Group 49,581,000 55,096,000 
Regular Smoker (cig., cigars, and/or pipes) | 33,566,000 68 14,953,000 27 
Smoked Occasionally 2,448,000 5 2,259,000 4 
Never Smoked 10,704,000 22 35,785,000 65 
No Data 2,863,000 6 2,099,000 4 
Regular Cigarette (only) Smokers 23,411,000 47 14,953,000 27 


p Abstracted from Table 2, p. 57, Haenszel, Shimkin, and Miller (1956). 3 
This total does not equal the sum of the number shown below because some smokers, primarily among males, 
often smoke pipes and cigars, as well as cigarettes, and thus any given individual may be represented twice, once 
as a Regular Smoker and once as a Regular Cigarette (only) Smoker. 


Sex and smoking. As is suggested 
in Table 1 and summarized in Table 
2, fewer women (27%) over age 18 
than men (68%) over 18 smoke 
regularly. In women, smoking is al- 
most exclusively confined to ciga- 
rettes, whereas with men 68% smoke 
regularly in one form or another, 
while only 47% regularly smoke 
cigarettes exclusively. Although not 
shown in our Table 2, the data 
gathered by Haenszel et al. further 
showed that in the year 1955, in the 
18-24 age group, 34% of males had 
started smoking by age 18, and 50% 
had begun by age 24. For females in 
the 18-24 age group, 16% had 
started smoking by age 18, and 36% 
had begun by age 24 (Haenszel, et al., 
1956, p. 56). 

Although the figures on percent 
Smokers for young males have changed 
little in the past 50 years, young wom- 
en have shown a substantial in- 
Crease. Whereas few women now in 
their 40’s and 50’s smoked at an 
early age, 20% of the women now in 
their 20’s were regular smokers by 
the age of 18.5 (Haenszel, et al., 1956, 
P. 17). Thus as is the case for sexual 


habits (Kinsey) smoking habits of 
women born in the past several dec- 
ades are very different from those of 
their mothers and grandmothers. A 
progressive loosening of “moralistic” 
attitudes in the past 50 years has 
been suggested as a contributing 
factor. 

Evidence that the relationships 
shown in Tables 1 and 2 between 
smoking and age and sex are not 
unique to the inhabitants of the 
United States comes from a recently 
completed study of the smoking hab- 
its of the entire population of Israel 
(Kallner, in press). Except for minor 
differences, the age and sex relation- 
ships were similar. 

Race and cigarette smoking. The 
United States data of Haenszel et al. 
(1956, pp. 36-38) permit the follow- 
ing conclusions: The age and sex dif- 
ferences noted above hold for non- 
whites as well as whites. In the 
United States, there is no difference 
between the percentage of whites and 
the percentage of nonwhites who 
smoke. However, quantity of smok- 
ing differs between whites and non- 
whites. Among smokers, there is a 
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significantly greater percentage of 
heavy cigarette smokers among 
whites (more than one pack a day) 
than there is among nonwhites. The 
percentage of white males is 13.3 and 
for nonwhite males 6.9. A similar dif- 
ference is found for females. Indirect 
evidence suggests an economic factor 
is involved in these differences. 

Although Lilienfeld’s (1959) smok- 
ers and nonsmokers were matched for 
race, his data show that, despite the 
fact that their own birthplace did not 
differentiate smokers from nonsmok- 
ers, the parents of nonsmokers were 
found to be more often foreign-born 
than those of smokers (p <.001). The 
author states that the implication of 
this difference is not clear. 

Marital status and cigarette smok- 
ing. The data from Haenszel et al. 
(1956, pp. 44-46) permit these con- 
clusions: For all ages, and for both 

sexes, there is a greater percentage 
of smoking among divorced and 
widowed individuals than there is 
among both married and never- 
married individuals, Among single 
individuals (never-married) under 
age 45, one finds a smaller percentage 
of smokers than among married in- 
dividuals of comparable age. There 
are more single individuals who have 
never smoked than there are married 
individuals who have never smoked 
(in the 18-45 age group). There is a 
slight reversal of this trend in the 
age group over 55, 

Lilienfeld’s (1959) results bear out 
one aspect of the findings of Haenszel 
et al., in that he found that, although 
present marital status does not dif- 
ferentiate smokers from nonsmokers, 
the smokers had Previously married 
significantly more often (p <.001). 

Occupation and cigarette smoking. 
Among males a slightly larger pro- 
portion of the unemployed than the 
employed smoke. Professional and 


technical workers, although having 
the highest incomes, have a low 
smoking proportion (also see senen 
below on Income). There seems to ! e 
a complex relationship between =e 
class and smoking patterns. White 
collar groups (professional arrer 
managers, etc.) have fewer smoker: 
than are found among craftsmen, 
foremen, salespersons, opiaten 
and similar groups (for further data 
on this “‘social class” variable see sec- 
tion below on Socioeconomic status). 
It was found that military life is a 
ciated with a higher frequency a 
smoking for all age groups (25-65 a 
over) ; veteran age groups were fou = 
to have more smokers than gees 
veteran age groups (Haenszel, : 
1956, pp. 38 44: Sackrin & Conover: 
1957, p. 5). 

In the sample of 1,806 patalo 
adults, Lilienfeld found that ee 4 
change jobs significantly more otte 
than do nonsmokers (p <.001). b) 

Additional (longitudinal researc : 
findings on the relationship scar 
Occupation and cigarette ge 
have been reported recently 4 
Heath (1958) and McArthur et a 
(1958). These data are shown below 
in Table 5, i, 

Urban-rural residence and cigt 
reite smoking. Present le 
residence differentiates smokers fro 
nonsmokers sharply. There 1S ; 
smaller percentage of smokers of bo = 
sexes and at all ages in the rural Ta 
population than in either the ru 2 
nonfarm or in the city populator 
Rural nonfarm persons resem = 
closely urban dwellers in their smo ai 
ing habits. Among males there i 
little or no variation in smoking pê 
terns from one region of the Unite 
States to another. On the other han , 
the 1955 Census results show T 
cigarette smoking is more prevale 
among women in northeastern an 
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TABLE 3 


INCOME AND REGULAR 


CIGARETTE SMOKING* 


Males 


Females 


1954 Income % Smokers 


% Smokers 1954 Income 


Less than $1,000 39 
$2 ,000-$7 ,000 56-60 
$7,000 and Over 50 


19 Less than $1,000 
28-33 $2 ,000-$4 ,000 

32 $4,000 and Over 

23 No Money Income 


(Homemakers) 


a Abstracted from Sackrin and Conover (1957, pp. 2-3). 


western states than elsewhere in the 
United States (Haenszel, et al., 1956; 
Sackrin & Conover, 1957). 

Income and cigarette smoking. The 
main findings of Sackrin and Conover 
(1957), from their study of 18,000 
Americans, have been abstracted in 
Table 3. It is clear from the results 
in the table that regular cigarette 
smoking is, in fact, related to re- 
ported annual income. 


The proportion of males who smoke regu- 
larly increases from 39% of the men whose 
annual income is less than $1,000, to 56-60% 
of those in four income brackets from $2,000 
to $7,000, For men receiving $7,000 and over, 
the proportion of regular cigarette smokers 
drops to a little over 50% (p. 2). 

Less than a fifth of the women whose per- 
sonal incomes are less than $1,000 a year 
smoke regularly, but the proportion increases 
to about a third for women receiving incomes 
of $3,000 or more. About a fourth of the 
women who receive no personal income 
(largely home-makers) smoke cigarettes (p. 5). 

Income appears to have a greater effect on 
rates of smoking than on the percentage of 
regular smokers, although differences asso- 
ciated with age and other population charac- 
teristics were noted also. The majority of all 
regular cigarette smokers smoke from 10 to 20 
cigarettes daily, generally regardless of age or 
income level. Enough men exceed this range 
to bring the overall daily average to 21-22 
cigarettes for male smokers. The daily aver- 
age for women... is 14-16 cigarettes daily 
(p. 8). 


Socioeconomic status and cigarette 
Smoking. From our own department, 
Allen (1958) reported the smoking 


behavior of three different groups. 
The first was a group of 40 psychiat- 
ric patients from the Massachusetts 
General Hospital whom we studied 
by means of individual psychiatric 
interview of a special standardized 
kind and by a battery of individually 
administered objective and projective 
personality tests. Of this group, 31 
were cigarette smokers, 9 were non- 
smokers. The second group consisted 
of 114 female student nurses from a 
school of nursing located in the north- 
west, of which 50 were cigarette 
smokers and 64 were nonsmokers. 
The third group included 140 male 
and female university undergraduate 
students, also from the northwest. 
Seventeen of the females were smok- 
ers, 31 were nonsmokers; 54 of the 
males were smokers, 38 were non- 
smokers. Characteristics of these 
groups are given in Allen (1958) and 
Matarazzo, Matarazzo, Saslow, and 
Phillips (1958). 

The measure of socioeconomic 
status used in each of the three 
groups was the Hollingshead scale 
(Hollingshead & Redlich, 1958), an 
index which combines own (or par- 
ents’) education and present Oc- 
cupation to yield a single socioeco- 
nomic status score. 

As can be seen in the first row of 
Table 4 below, no relationship was 
found between socioeconomic status 
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and smoking in any of the three 
groups. Thus, studying a total of 
294 individuals of widely varying 
ages and a considerable range in 
socioeconomic status yielded the 
finding that smoking seems to be un- 
related to this measure of socioeco- 
nomic status. However, a study by 
McArthur et al. (1958), described 
below, indicates that, for Harvard 
undergraduates, ‘‘nonsmokers tend 
to be lower-middle class in origin, 
upwardly mobile, earnest young 
men...’’ (p. 274). These authors 
(p. 269) cite a survey published in 
England in 1948 which “suggested 
that nonsmoking was commoner 
among men of the English middle 
class while heavy smoking was com- 
moner among English working-class 
men.” These results appear to con- 
flict with our own findings. Further- 
more, our own findings with the 
Hollingshead measure of socioeco- 
nomic level seem to conflict with the 
finding by Sackrin and Conover of a 
relationship between earned annual 
income and smoking. However, 
when it is remembered that educa- 
tion and occupation (the Hollings- 
head measure) do not correlate 
highly with annual income, and that 
in all probability, neither correlates 
highly with the McArthur et al. 
measure, it is apparent that better 
indices of socioeconomic status are 
required or, in their absence, a study 
employing the available indices with 
the same subjects might shed some 
light on these inconsistencies. 
Education and cigarette smoking. 
The most direct results bearing on 
this relationship were given by 
Lilienfeld (1959, p. 277). His study 
showed that adult smokers and non- 
smokers do not differ significantly in 
final number of years of schooling 
completed. There were as many 


smokers as nonsmokers who had had 
no schooling, or who had attended 
ollege, etc. : 

° For people still in school, since age 
and number of years of school com- 
pleted are obviously correlated up to 
about age 25, it can be conclude 
that up through high school, college, 
and professional school there is an 
increase in the percentage of smokers 
as grade level of students still a 
school increases. A study of 6,3 
college students in 11 Texas college? 
(aged 15-39) by Kirchoff and Rig om 
(1954, p. 296) showed just such a 
increase of 30% to 63% during os 
four college years. A study by ee 
Courts, Taylor, and Solomon aR 
on the smoking habits of the 22, A 
high school students in Portai 
Oregon, extends these observatio. 
backward into the high school years- 
For boys, smoking increased from 
14.5% of the total in the cha 
year to 35.4% in the senior year- oa 
comparable increase for girls Wa 
from 4.6% in the freshman year tO 
26.2% in the senior year. Bari 
(1957) provides some anecdotal o , 
servations on the smoking habits ° 
English schoolboys. 

A study made of a group of college 
students at Antioch by J. R. EarP 
(1936) showed that of 177 students 
who smoked, 57% failed to graduate; 
of 176 nonsmokers 31.8% failed a 
graduate. Vallance (1940-45, p- 139 
correctly points out that this finding 
is merely a correlation and revene 
nothing about smoking being ea 
cause of college failure; individua $ 
who earn poor grades might be t 
ones who take up smoking. k 

A study of smokers and ne 
smokers as related to achievement 
and various personal characteristic 
made by R. M. Lynn (1948) sowei 
that adolescent boys who do n° 


ii — 


CHARACTERISTICS OF SMOKERS AND NONSMOKERS 


499 


TABLE 4 
MEANS AND RANGES OF SMOKERS AND NONSMOKERS 


` Psychiatric Patients Student Nurses University Undergraduates 
Variable (N =40) (N=114) (N =140) 
Non- z Females Females Malı 
smokers Smoke 5 Non smoki Non- Smoki Non- 2 
(N=9) (N =31) | smokers >TOX*S | smokers mokers cokers Smokers 

Socioeconomic 
Index: 

Mean 60.2 57.9 48.6 46.2 43.6 42.1 49.8 45.4 

Range 44-73 14-77 | 11-73 11-77 | 11-71 11-73 11-77 11-73 
IQ: 

Mean 93.6 98.8 117.6 118.4 110.9 109.2 107.9 109.2 

Range 77-109 79-129| 103-130 103-129} 87-129 92-122 84-130 89-131 
Anxiety level: 

Mean 28.9 25.9 12.3 14.8* 12.0 15.3 11.0 14.7* 

Range 13-39 6-45 3-26 3-34 5-28 6-45 2-30 1-33 
Psychosomatic 
Symptoms: 

Mean <1 13.9 6.3 8.2* Sai! 6.1 3.3 3.9 

Range 2-23 1-44| 0-22 0-18 0-14 0-18 0-12 0-19 
Cups of Coffee: 

Mean 2.8 4.2 0.9 2.6%" 1.5 221 1.0 BeSt" 

Range 0-8 0-15 | 0-6 0-10| 0-10 0-6 0-6 0-12 
Liquor Score: 

Mean 1,22 2.06 1.0 Tat 1.3 135 ito ae 

Range 1-2 1-6 1-2 1-2 1-2 1-2 1-5 1 


* Mean differences significant at the .05 level. 

** Mean differences significant at the .001 level. 
smoke, on the average will gain more 
weight, make higher grades in school, 
fail less often, cause less disciplinary 
trouble, make better scores on psy- 
chological tests, be troubled less with 
respiratory diseases, than the occa- 
sional smoker and the habitual 
smoker. The study showed also that 
smoking and poor scholarship do not 
always go together, in that scholastic 
averages according to age favor the 
nonsmokers in some cases and the 
smokers in others. : 

Participation in sports and cigarette 
Smoking. Lilienfeld (1959) found that 
smokers had participated to 4 


greater extent than nonsmokers in 
certain major sports (p<.01), had 
participated more frequently in a 
miscellaneous grouping of other non- 
specified sports (p <.05), and, at the 
time of the interview, were par- 
ticipating in a greater number of 
sports ($ <.02). What relationship, 
if any, there is to Heath's finding 
that smokers had more combat duty 
in World War II than nonsmokers, 
and Ianni’s finding of more driving 
accidents among smokers, is unclear. 
These studies will be described below. 

Driving accidents and cigarette 
smoking. Data for this variable were 
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made available to us in a personal 
communication by Ianni and Boek? 
(from Russell Sage College) who 
found, in a well-controlled study, 
that in a group of 161 drivers who 
had just been in a driving accident, 
there was a higher frequency (76%) 
of current cigarette smokers thanina 
control group of 196 nonaccident 
drivers selected from the same driv- 
ing population which had only 54% 
current cigarette smokers (p com- 
puted from their data is <.001). The 
number of ex-smokers was 13% in 
both the accident and nonaccident 
control group. 


PSYCHOLOGICAL VARIABLES 


IQ and cigarette smoking. The 
three groups studied by us provided 
data for this variable. IQ 


© younger age 
groups shown. As can be seen in 
Table 4, and despite the adequacy of 
the IQ range covered by our three 
samples (77 to 131), smokers do not 
differ from nonsmokers on this vari- 
able. Thus it can be concluded that 
IQ is probably not related to smok- 
ing. The relationship between this 
finding and the Previously cited find- 
ing that nonsmokers tend to earn 
better grades is unclear and points to 
the need for further research, How- 
ever, it should be remembered that, 
as pointed out by Super (1949, p. 90), 
“the correlation between intelligence 
tests and grades is not especially 
bigis ++ the modal 7’s being .30 to 


Anxiety and cigarette Smoking. The 


2 Ianni, F. A., & Boek, W, A study of the 
relationship between motor vehicle accidents 
and certain characteristics of drivers. Un- 
published manuscript, 1958, 


three groups investigated by us were 
studied on this variable also. The 
measure of anxiety used in each case 
was the Taylor Manifest Anxiety 
Scale, a reliable and reasonably well 
validated 50-item questionnaire de- 
signed to reflect conscious anxiety 
(Matarazzo, Guze, & Matarazzo, 
1955; Taylor, 1953; Taylor & Spence, 
1952). 

As is seen in Table 4, for the 40 
psychiatric patients the mean anxiety 
score for both the nonsmokers (28.9) 
and smokers (25.9) is very high 
(p<.001) relative to the mean sae 
iety scores in the two samples of a 
mals (means ranging from 11.0 to 
15.3), both smokers and nonemokera. 
The higher anxiety scores for te : 
psychiatric patients are very simila 
to the scores reported by us for a 
other psychiatric populations studie i 
at the Washington University Schoo 
of Medicine Outpatient Clinic (Ma- 
tarazzo, et al., 1955); while the lowen 
mean scores for the two norma 
groups also are exactly in the range 
typically found for unselected groups 
of normals (Matarazzo, et al., 1955; 
Taylor, 1953), 

The results shown in Table 4 make 
it clear that, for young normal sub- 
jects, smokers have higher anxiety 
scores than nonsmokers. While the 
differences between means are not 
great, they nevertheless reach statis. 
tical significance (p<.05) in two 0 
the 3 samples, and show a similar 
trend in the third (female university 
undergraduates), Relative to a 
nonsmokers in these two norma 
groups, the higher anxiety scores 
among the (normal) smokers, place 
the smoker at a point along the 
anxiety continuum which is closer to 
that characteristic of psychiatric 
patients, : 

That a similar finding of a higher 
anxiety score among smokers, rela- 
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tive to nonsmokers, was not obtained 
in the psychiatric population may 
well be a reflection of the higher over- 
all level of anxiety found in both 
smokers and nonsmokers in this 
population. The small number 
(nine) of nonsmokers in our psy- 
chiatric population does not permit 
us to establish this point. 
Psychological tension and cigarette 
smoking. In an interesting study of 
63 (normal) medical-surgical male 
patients and 32 psychiatric male 
patients in a Rhode Island Veterans 
Administration Hospital, Lawton and 
Phillips (1956) used the Cornell 
Medical Index and another specially 
devised questionnaire. The CMI 
Is a well-validated 195-item ques- 
tionnaire (Brodman, Erdmann, & 
Wolff, 1949) covering a wide range of 
somatic and psychological symptoms, 
not unlike the Taylor MAS. The 
authors concluded that among their 
normals, ‘Heavy Smokers,” when 
compared to ‘Moderate Smokers,” 
showed a greater number of signs of 
Psychological tension (12 p values 
from .01 to .05). They concluded 
that 
the present data appear to indicate a very real 
tendency for this present group of heavy 
smokers to exceed the group of moderate 
smokers and abstainers in various indices re- 
lating to presence of “nervous” traits. In both 
number of somatic and psychological com- 
Plaints, the heavy smokers seem to resemble 
the emotionally disturbed individual more 
than do the moderate smokers. In the man- 
her in which they are willing to describe them- 
selves [on the Adjective Check List of the 
Second questionnaire], the heavy smokers « - - 
[describe themselves as] less agreeable, happy, 
and relaxed... [and specifically rate them- 
Selves more often as] nervous and grouchy 
(p. 401). 


In addition, it was found that there 
was a significantly higher percentage 
of Heavy Smokers as against Mod- 
erate Smokers in the sample of 32 
Psychiatric patients than in these 63 
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nonpsychiatric medical-surgical pa- 
tients. 

Thus two conclusions from Lawton 
and Phillips’ study are indicated. 
First, that Heavy Smokers among 
the normals are more like the psy- 
chiatric patients by these various in- 
dices, and second, that psychiatric 
status is associated with a higher 
frequency of smoking. 

The first conclusion bears out our 
own observations on anxiety, dis- 
cussed above. The lack of compa- 
rability on the age factor between our 
own two normal samples and our 
psychiatric sample precludes any 
useful test of the second conclusion 
drawn from Lawton and Phillips’ 
data. 

Suggestibility and smoking. Stimu- 
lated by the statement made by 29 
out of 35 college student smokers that 
they had been influenced to take up 
smoking by their friends, Vallance 
(1940-45) studied the suggestibility 
of smokers versus nonsmokers. Using 
Hull’s body sway technique, he 
found among Miami University stu- 
dents (25 smokers and 22 nonsmok- 
ers) just the opposite; i.e., that the 
smokers were less suggestible than 
were the nonsmokers on this meas- 
ure. With the subject having his 
eyes closed while standing erect, the 
suggestion was made by the ex- 
perimenter that the subject was to 
imagine that he was falling forward. 
Under this set of suggestions the 
smokers swayed only 3.83 cm. while 
the nonsmokers swayed 5.27 cm. 
Other measures of suggestibility con- 
ceivably could show different results, 
since it is well-known that Hull’s 
measure is not highly correlated with 
other measures of suggestibility. 

There are such strong a priori 
reasons to think that parental and 
peer influences are strong factors in 
the initiation of the smoking habit 
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that one certainly should not con- 
clude from Vallance’s negative results 
with a crude laboratory measure of 
suggestibility that the issue is closed. 
As a matter of fact, the study by 
Horn et al. (1959) of 22,000 high 
school students in the Portland, 
Oregon, metropolitan area shows just 
such influences. In addition to many 
other important findings, their data 
reveal that: (a) the percentage of 
smokers is highest among children of 
families in which both parents 
smoke cigarettes, and lowest in 
families in which neither parent has 
been a smoker; (b) the percentage of 
smokers is higher in Catholic paro- 
chial schools than in the city public 
schools, and is lowest in the suburban 
public high schools; (c) the per- 
centage of smokers is highest among 
students who do not participate in 
any school activities; (d) the per- 
centage of smokers in this young age 
group is inversely related to the 
educational level of the parents; (e) 
the percentage of smokers is higher 
among those students who are behind 
their age-equals scholastically, thus 
confirming the findings of Earp 
(1936) and of Lynn (1948). 

Emotional status and cigarette smok- 
ing. As part of his study of 903 
smokers and 903 nonsmokers, Lilien- 
feld (1959) utilized a 31-item ques- 
tionnaire made up from a list devel- 
oped by Stauffer et al. The 31 items 
selected were those which differ- 
entiated a “normal” from a “neu- 
rotic”’ group. Examination of the 31 
items used by Lilienfeld (1959, pp. 
264-268) makes clear that this ques- 
tionnaire utilized items very similar 
to those contained in the MAS, 
CMI, and the Saslow Psychosomatic 
Screening Inventory (see section 
below). ¢ 

Lilienfeld’s findings (p. 269) were 
that “the responses by cigarette 
smokers on the questions concerning 
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emotional status were consistently 
more ‘neurotic’ than those of non- 
smokers.” Nineteen of the 31 items 
reached statistical significance: 11 at 
the .001 level of confidence, 4 at the 
.01 level, 1 at the .02 level, and 3 at 
the .05 level, all in the direction 
stated. The remaining 12 items were 
nondiscriminating. 

An additional finding (p. 278) was 
that the smokers had had a signif- 
icantly greater number of hospitaliza- 
tions (p <.001). The author sug- 
gested that while the excess number 
of hospitalizations may reflect neurot- 
icism, they also may reflect pene 
reported associations between a 
and cigarette smoking. Lilien oe 
(p. 276) reports that an analysis = 
the reasons for these hospitalizatio! 
is now in progress. : 

These Gail ge of Lilienfeld, while 
in agreement with the findings ° 
Lawton and Phillips, and our ow! 
are important for several addition 
reasons: (a) the sample was ee 
(N=1,806) and was an adequa 
representation of the entire hams 
adult population of a large city; ( 5 
the sample contained two Cees 
matched on four important variables 
(age, sex, race, and social status) i 
(c) there was a reliability check Si 
the findings. Unlike the two othe 
studies which used selected subgroups 
in the population (university uni 
graduates, student nurses, psychia 
ric patients from a large med 
center, and VA hospital patienti 
Lilienfeld’s study dealt with a 1° i 
institutional sample of adult ne 
mals which was more representat! Ha 
of the total sample of adults in ae 
United States than were the ot E 
specialized samples. Yet the fact ar 
his findings and those of the oth 
investigators agree is striking. J 

At this point a comment oem 
order about a potential shortcom! 
of these three questionnaire stud! 


in 
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and several of the others here being 
reviewed: The differences between 
smokers and nonsmokers were based 
on self-report. While it is probably 
true that smokers and nonsmokers 
have similar “test-taking attitudes,” 


there is always the possibility that- 


the “‘response-set” of the two groups 
differs in such a way that smokers 
have a lower threshold for admitting 
neurotic” items than do nonsmokers. 
No study suggesting such a possibil- 
ity is known to us. On the other 
hand, there are sufficient (neurotic) 
indices not based on self-response 
questionnaires which significantly 
differentiated the two groups (e.g. 
Lawton and Phillips’ finding of a 
greater incidence of Heavy Smokers 
in their psychiatric sample in con- 
trast to their nonpsychiatric control 
sample) that, in the absence of data 
suggesting otherwise, it can be pre- 
sumed that smokers and nonsmokers 
have similar test-taking attitudes, 
and that the obtained differences re- 
flect significant differences between 
the two groups. Even if it were 
shown that test-taking attitudes were 
different, this fact, by itself, would 
still be an important psychological 
difference between smokers and non- 
smokers. 

Psychosomtic screening 
scores and cigarette smoking. Table 
4 contains for the three popula- 
tions studied by us mean scores 
for smokers and nonsmokers on the 
Saslow Psychosomatic Screening In- 
ventory (Gleser & Ulett, 1952; 
Saslow, Counts, & DuBois, 1951). 
This test requires the respondent to 
heck from a list of 23 8ympLomi 
(dealing with bodily and mood 
dysfunction) the particular symp- 
toms he or she experiences in every- 
day anxiety and anger situations. 

In all three populations studied, 
the smokers report a greater number 
of psychosomatic symptoms than do 


inventory 
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nonsmokers. While these differences 
reach statistical significance only in 
the student nurse group, the trend 
is obviously in the same direction 
with the two other samples. 

F Coffee and alcohol consumption and 
cigarette smoking. In our own three 
groups, coffee intake was scored as 
number of cups consumed per day, 
while alcohol intake was estimated 
on a weekly basis by a crude scale 
that gave differing weights to beer, 
wine, and whiskey. A score of 0 was 
given to nondrinkers while a score of 
5 was assigned to 21 or more glasses 
of beer per week, or 21 or more 
glasses of wine, or 21 or more single 
shots of whiskey. 

The results. shown in Table 4 are 
all in the same direction, and reach 
statistical significance (p<.001) in 
two of the four comparisons for coffee 
consumption. Thus, smokers con- 
sume more coffee. For liquor, again 
the trend is in the same direction for 
all groups, and the differences reach 
statistical significance (p <.001 and 
05) in two of the four comparisons. 
Smokers also consume more alcohol. 

Both these findings (p<.01) have 
been reported independently by Mc- 
Arthur et al. (1958, P. 269) and 


Heath (1958, P- 385). 
An interesting observation in the 


literature on the possible interaction 
between coffee and cigarette intake 
is that by Troemel, Davis, and 
Hendley (1951). Studying a phenom- 
enon (dark adaptation), long of in- 
terest to experimental psychologists, 
these authors report an increased 
speed of dark adaptation after in- 
haling cigarette smoke, which could 
be counteracted by the simultaneous 
intake of caffeine in a small dose but 
not by a larger dose of caffeine. 
Little systematic research has ap- 
peared describing the relationship 
between food intake and smoking- 
However, smokers occasionally re- 
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port that smoking helps them keep 
down their weight. Brozek and Keys 
(1957), in a study indirectly bearing 
on this point, have reported that 
over a 5-year period, smokers who 
discontinue smoking for a 2-year 
period show a statistically significant 
(p <.001) weight gain compared 
either to themselves or to a control 
group of smokers who continue to 
smoke. The smokers who stopped 
smoking showed a weight gain of 
3.73 kg. (8.2 Ib.) relative to their own 
Previous weight, while the continuing 
smokers showed a weight loss of .50 
kg. (not significant). Hammond and 
Percy (1958) recently have confirmed 
this weight increase when smoking is 
discontinued. These authors found 
that, of 333 ex-smokers, 246 (73.9%) 


said they gained weight when they 
stopped smoking, 


Lonc-Trrm STUDIES oF SMOKERS 


We were able to find only one such 
study: a 20-year longitudinal study 
of 252 smokers and nonsmokers from 
Harvard College. These were all 
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participants in the well-known ee 
of Adult Development (Grant Study , 
a long-term study of selected no 
mal” college men who were anmes 
during their undergraduate m 
(1939-1942) and who have been - o 3 
lowed by interview and question 
naire from that time to the present— 
a period of 20 years. The acon 
data were incidental to the main o : 
jectives of the study, but kaye pe 
cently been reported by McArt a 
et al. (1958) and Heath (1958). T ae 
reports contain a wealth of anecdo 
and statistical comparisons which am 
not easily summarized and for whioh 
their publications should be cor 
sulted. 

Of the 252 subjects 61 (24.27%) 
were nonsmokers, 95 (37.7%) 1%) 
moderate smokers, and 96 (32-170 
were heavier smokers. During t 5 
20-year period heavy smokers aa 
nonsmokers showed a number of at 
trasting characteristics which oo 
Summarized in our Table 5 taker 
from Heath (1958, p. 387). ome 

he results of this study are im 


TABLE 5 


CONTRASTING CHARACTERISTICS OF N 


ONSMOKERS AND HEAVIER SMOKERS" 


Nonsmokers 


Heavier Smokers 


Bland affect 
Inarticulate 
Well-integrated 
Physical sciences 
Most stable personality 
Major in college: natural sciences 
Careers: chemistry, physics 
Like: science research worker 
Dislike: sales manager 
Psychotype: cerebrotonia 
Answer questionnaires promptly 
Armed Service: Navy; well-adjusted, noncom- 
bat duty r 
iratory rate: slow 
Sake and seston: diminished 
Reflexes: increased 
Less alcohol and coffee 


Cultural 
Lack of Purpose and values 
Less well-integrated 
Practical organizing 
Less stable Personality s 
Major in college: social studies, arts and letter 
Careers: social relations, education 
Like: judge 
Dislike: science research worker 
Psychotype: viscerotonia 
Delay answerin; uestionnaires a 
Armed Service: Army; less well-adjusted, com 
bat duty 
espiratory rate: rapid 
Sighs and Swallows: increased 
flexes: decreased 
More alcohol and coffee 


8 From Heath (1958, p. 387). 
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portant because, as a 20-year longi- 
tudinal study, it contains the only 
published observations of their kind. 
The findings point to many psycho- 
logical variables which deserve fur- 
ther study. However, since many 
hundreds of variables were studied 
by the several Harvard investigators, 
and since apparently so few vari- 
ables (those shown in Table 5 and a 
few others) reached statistical signif- 
icance, there is always left open the 
question whether or not the reported 
findings are due to chance phenom- 
ena. McArthur et al. are aware of 
this possibility (1958, pp. 273-274) 
and offer some cross-validational 
findings from two classes of recent 
Harvard undergraduates. The bulk 
of the findings, while very suggestive 
and stimulating, needs confirmation, 
however. 


FACTORS INFLUENCING THE 
INITIATION OF SMOKING 


Despite the likely importance of 
this subject, little systematic study 
has been made of the possibility that 
the factors associated with the initia- 
tion of smoking may not be the same 
as those associated with its continua- 
tion. In our own opinion, there is 
considerable a priori reason to be- 
lieve that the factors which motivate 
people (especially teenagers) to start 
to smoke are probably very different 
from those factors which help per- 
petuate the habit once smoking has 
gone on for any length of time.’ 

The earlier described study by 
Horn et al. (1959) of 22,000 high 
school students points to factors as- 
sociated with the initiation of smok- 
ing (parental smoking habit, par- 
ticipation in school activities, etc.) 
which could hardly be thought to 

3 To examine this hypothesis, personnel and 


facilities of our department were made avail- 
able to Horn of the American Cancer Society. 


have equal importance for continua- 
tion of smoking in later years. This 
age group provides many oppor- 
tunities to study both peer and 
parental influences associated with 
teenagers’ beginning to smoke. The 
recently published results undoubt- 
edly will prove interesting and pro- 
vocative to other investigators. Ex- 
cept for the few indirect hints arising 
from the 20-year follow-up study of 
Heath (1958) and McArthur et al. 
(1958), little is known about why 
Americans continue smoking once 
they have started. 

A second study has been carried 
out by Phanishayi (1951). Although 
the number of subjects studied was 
small (48 male college graduate and 
postgraduate students), and the 
country (India) not our own, he was 
able to gather suggestive data on 
self-reported reasons why these col- 
lege men (mean age 25) began to 
smoke and also, once having started, 
why they continued to smoke. Pha- 
nishayi prepared a 48-item question- 
naire on which each student was re- 
quired to check for 24 items those 
reasons, and only those reasons, 
which were associated with his be- 
ginning to smoke and, for the re- 
maining 24 items, only those reasons 
why he continued to smoke. Some 
of the items in the two lists were 
identical. The two parts of the 
questionnaire thus had certain ele- 
ments peculiar to each smoking 
stage and certain others common to 
both. At the end of the 24 state- 
ments in both Parts I and II, a pro- 
vision was made for the subject to 
give other causes if he had any. In 
addition, each subject was asked to 
rank-order his own choices as to the 
three most important reasons why 
he began and why he continued to 
smoke. The 48 subjects were told 
the purpose of the investigation and 
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were requested to try to be true to 
facts as far as possible. The results 
of this study were as follows. 

Initiation of smoking: the reasons 
checked for the beginning of the 
smoking habit are diverse and varied, 
although the most frequently stated 
reason is a curiosity as to the nature 
of pleasure afforded by smoking. The 
most frequent reasons for beginning 
and the percentage of the students 
listing each were as follows: 

I wanted to see what sort of pleasure I 
would get out of it (75%). 


I thought there was nothing wrong in doing 
so (52%), 


I thought it would help me sit up during 
nights for study (50%). 


_l thought there must be something attrac- 
tive about it because so many people do so 


(48%). 


I thought that a trial would cost nothing 
8%). 


Two of the reasons with the lowest 
frequencies were: 


I was tempted by the advertisements (4%). 

I wanted to rebel against the authority of 
my parents and smoking is one of the Ways 
of doing so (4%). 


Continuation of Smoking: the rea- 
sons checked why smoking is con- 
tinued seem to have a strong psycho- 
logical basis, especially involving re- 
duction of anxiety, tension, and 
loneliness. The most important 
stated reasons why these 48 students 
continued to smoke and the per- 
centages were as follows: 

It serves as 
(75%). 

It warms me u 

It helps to for; 
(60%). 

It facilitates thinkin: 
tion (52%). 

It enables me to ac 


(44%). 
The lowest frequency reason given 
for continuing to smoke was: 


To give up would be a submission to 
orthodoxy (4%). 


a companion when I am alone 


P when I am cold (71%). 
get my worries and anxieties 


g and gives inspira- 


quire new friends easily 
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Despite the few subjects studi 
Phanishayi’s results suggest (1951, 
p. 36) that 
the causes which prompt a person to EE 
the habit (curiosity, especially) are di Er 
from those which are responsible for its ee 
tinuation (tension-reduction capabilities, 
pecially). 


There is every reason to exPey 
that the several expected repor | 
from the sample of 22,000 high schoo; 
students of Horn et al. will yield 7a 
sults for beginning the habit simi ai 
to Phanishayi’s, in addition to other: 
which Horn’s different methodo ey 
may unearth. As a matter of =e 
unpublished findings in the ae 
report of Schubert (1959) o E 
northeastern United States CO ne 
students indicates that the pee i 
given by these students for beg 
to smoke were not too unlike vate 
of Phanishayi’s sample. ans his 
(1959) also found that for both a 
male and female subjects three sca 
of the MMPI (Ma, Hy, and ‘i 
differentiated smokers and nonsm 
ers. J le 

As to the reason for excessive a. rz 
ing once the habit has begun, tae 
ler (1946, 1953), on the basis of ae 
cases of compulsive smokers who ¥ 
derwent psychoanalysis, suggests 
ucture 


spe- 
was found: These patients represented @ sly 


reassurance in life (1946, p. 320). 


Factors INFLUENCING THE TER 

MINATION OF SMOKING oe 

How many people quit smolt 

once they start is not yet gaina j 

Recent studies (Haenszel, et al., a 
p. 24; Hammond & Percy, 1958, 


> p 


Ses, 
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2956) place the number at under 
20%. One of the best studies is the 
recent one by Hammond and Percy 
(1958). 

Hammond and Percy study. Their 


population and results were as fol- 
lows: 


Of 3,560 out of 5,992 men (selected from 
telephone directories all over the country) 
who filled out a questionnaire on smoking 
habits, 2,498 (70.2 per cent) stated that they 
smoked cigarettes regularly or had done so in 
the past. Of these 2,498 men, 472 (18.9 per 
cent) stated that they no longer smoked 
Cigarettes or tobacco in any form. A question- 
naire asking why they had stopped smoking 
was sent to the 472 ex-smokers, and 333 
(70.6 per cent) replied. 

Only 6.3 per cent of the 333 ex-smokers 
said that they gave up the habit because of 
reports linking smoking to lung cancer, andan 
additional 2.4 per cent said that they gave it 
up because of reports that smoking has a bad 
effect on health in general. In other words, 
only 1.6 per cent of men with a history of 
regular cigarette smoking said that they gave 
up the habit because of reports relating 
Cigarette smoking to lung cancer or other 
diseases. Some condition apparently made 
worse by smoking was given as a reason for 
stopping by 208 (62.5 per cent) of the 333 ex- 
smokers. Coughing was the most frequently 
mentioned reason for giving up the habit. 

Some improvement, such as less coughing, 
less shortness of breath, etc. was noted by 272 
(81.7 per cent) of the men as an apparent re- 
sult of giving up smoking. Of the ex-smokers, 
246 (73.9 per cent) said that they gained 
weight when they stopped smoking. 

. .. Itis obvious that only a very small per- 
centage of cigarette smokers have given up 
the habit consciously and admittedly because 
of reports linking cigarette smoking to lung 
cancer and other serious diseases. Even if the 
reported link with lung cancer was a con- 
tributory factor in several times as many cases 
as recorded on these questionnaires, it still 
was relatively unimportant in terms of induc- 
ing cigarette smokers to stop. On the other 
hand, reports on lung cancer and longevity 
may well be a major factor in the remarkable 
increase in popularity of filter-tip cigarettes 
which are advertised as having “less tar and 
nicotine” (p. 2959). 


That the findings of Hammond and 
Percy that the many reports linking 
lung cancer and cigarette smoking 


507 


have apparently induced only a few 
smokers to give up the habit are 
probably an accurate estimate of the 
facts for the general public is drama- 
tically attested to in a second study. 

Lawton and Goldman study. These 
investigators (1958) conducted a sur- 
vey of 72 internationally renowned 
lung cancer scientists who attended 
the American Cancer Society spon- 
sored conference on lung cancer at 
Virginia Beach in 1957. Their control 
group consisted of 72 psychologist- 
scientists from the Division of Ex- 
perimental Psychology of the Amer- 
ican Psychological Association. Both 
groups were matched for age and sex, 
and roughly for scientific nature of 
their interests. 

Interestingly, the two groups did 
differ significantly in (a) the per- 
centage of current cigarette smokers 
(p<.01, one-third fewer current 
smokers among the lung cancer 
scientists), and (b) past incidence of 
cigarette smoking ($ <.05, with one- 
third more lung cancer scientists hav- 
ing never smoked). This difference 
in current and life-long smoking 
pattern, with more nonsmokers 
among the lung cancer scientists in 
contrast to the the behavioral scien- 
tists, would appear to be an inde- 
pendent confirmation of Heath’s 
(1958) finding in the 20-year follow- 
up study of college major and career 
choices of Harvard smokers and non- 
smokers described earlier (see Table 
5). 

Lawton and Goldman's next find- 
ing was not surprising in view of the 
different areas of specialization in- 
volved (and thus presumed firsthand 
familiarity with the scientific liter- 
ature in each of the two fields) : 83.3% 
of the lung cancer scientists versus 
only 63.6% of the psychologist-scien- 
tists felt that cigarette smoking is a 
cause of lung cancer (p<.01). When 
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the number of smokers in both groups 
was held constant, this significant 
difference still existed. 

Despite this greater knowledge 
and/or conviction in the lung cancer 
scientist group regarding the health 
hazard brought on by cigarette 
smoking, Lawton and Goldman’s 
next finding was that there was not a 
significantly greater number of lung 
cancer scientists who had quit smok- 
ing in the period 1952-1957; nor did 
a significantly greater percentage of 
them express a dissatisfaction with 
their smoking habit (25% in one 
group and 27% in the other expressed 
dissatisfaction that they were ciga- 
rette smokers), 

A final part of their study showed 
that expressed attitude toward lung 
cancer causation did have a signifi- 
cant effect upon the smoking be- 
havior of the psychologist sample of 
smokers: smoker-psychologists who 
felt that smoking is a cause of lung 
cancer tended to havea lower current 
incidence of smoking (b<.02), at- 
tempted to cut down the amount of 
their daily consumption significantly 
more (p <.01), had a greater number 
of (unsuccessful) attempts to stop 
smoking (p <.05), and expressed 
more dissatisfaction with their cur. 
rent smoking habit (p<.02). The 
fact that there were so few (3 out of 
70) lung cancer scientists who felt 
that smoking does not cause lung 
cancer made a similar analysis of 
these attitudes for this group not 
meaningful. However, the fact that, 
relative to the Psychologists, there 
was not a significantly greater num- 
ber of quitters in the lung cancer 
scientist group, with their greater ex- 
pressed conviction of the health 
hazard, again emphasizes the im- 
portance of noncognitive elements 
in smoking behavior, and highlights 
the fact that even for health experts 
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the health hazard per se is not a suffi- 
cient deterrent to cigarette smoking. 
Thus, one is not surprised that Ham- 
mond and Percy (1958) found that 
only 6.3% of ex-smokers said they 
quit smoking because of the lung 
cancer reports. Whatever factors 
motivate continuance of smoking are 
thus seen to be more potent than 
scientifically accepted probabaiy 
statistics showing a greater healt 
risk. 

Study of Harvard men. McArthur 
et al. (1958, pp. 272-273), reasoning 
from a very small sample of those 
smokers in their sample who coue 
stop and those smokers who conli 
break the habit, found some slig 
suggestions that early breast-feeding 
habits and later personality integra 
tion seem to be related to ability or 
inability to quit smoking. Howes 
these findings and those of Bergle 
(1946, 1953) on the reasons for exces- 
sive smoking probably will have to be 
investigated by other measures 0 
“personality integration” in addition 
to the clinical interview and the 
Rorschach inkblot technique. 


Discussion 


Table 6 is a summary of the char- 
acteristics which do, and those which 

o not, differentiate smokers from 
nonsmokers, It is clear from the 
studies just reviewed that our knowl- 
edge of the personality and psycho 
social characteristics of smokers an 
nonsmokers is only in its beginnings- 
Nevertheless the studies do suggest 
some answers to pertinent questions 
which have been raised in this genera 
area, , 

First, the question of a snake 
personality.” Inspection of Tables 
through 5, plus consideration of the 
other studies reviewed makes clear 
that while smokers do differ from 
nonsmokers in a variety of character- 
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TABLE 6 
SMOKERS AND NONSMOKERS: SUMMARY OF DATA REVIEWED 


Differentiating Characteristics 


Nondifferentiating Characteristics 


Age 

Sex 

Foreign-born parents 

Current marital status; marital history 

Occupation, including unemployment 

Military service, current or history of 

Frequency of job changes 

Urban-rural residence 

Residence, United States coastal versus non- 
coastal 

Income 

Social class, including social mobility 

College graduation history 

Delicate motor coordination 

Characteristics from 20-year study sum- 
marized in Table 5 

Body sway suggestibility 

Driving accidents 

Participation in sports 

Taylor anxiety score 

Psychological tension level (CMI) 

Psychiatric versus medical status 

Psychosomatic symptoms, number of 

Emotional status (“neuroticism”) 

Extraversion-introversion 

Number of hospitalizations 

Coffee consumption 

Alcohol consumption 

Weight increase 

Dark adaptation Ses 

Cancer-scientist versus psychologist-scientist 

Smoking habits of parents 

Educational level of parents 

Own education-age concordance 

Parochial versus public school attendance 

Participation in school activities 


Race, United States white versus United States 
nonwhite 

Hollingshead Index of Socioeconomic Status 

Education, highest grade attained 

1Q 

Mental and motor functioning, immediate ef- 
fects upon 


istics, none of the studies has shown 
a single variable which is found ex- 
clusively in one group and is com- 
pletely absent in the other. While 
this is true for all of the variables 
summarized in the appropriate 
column in Table 6, it is especially 
true for the variables measuring per- 
sonality characteristics. Thus, while 
Taylor anxiety score, and Cornell 
Medical (psychiatric) Index, etc. sig- 
nificantly differentiate smokers from 
Nonsmokers, the mean differences are 


not large. Examination of the means, 
standard deviations, ranges, percent- 
ages, etc. of the various published 
studies makes clear that while group 
trends suggest the smoker to be more 
“neurotic,” on the average, there are 
still many individual smokers with 
neuroticism, or anxiety, etc. scores 
lower than those of many nonsmok- 
ers, and vice versa. Thus a clear-cut 
smoker's personality has not emerged 
from the results so far published in 
the literature. This is not surprising 
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when it is remembered that approxi- 
mately 60 million Americans over the 
age of 18 smoke. It is hard to believe 
that they would share in common one 
personality “type.” This is not to 
imply, however, that the various psy- 
chological dimensions along which 
smokers have been shown, as a group, 
to differ from nonsmokers may not 
suggest an important single process, 
or processes, underlying these various 
demonstrated differences. Further 
research may indeed so systematize 
the disparate findings. 

While the evidence for a smoker's 
personality is not strong, it is possible 
to think that there may be in certain 
individuals a biological or genetic pre- 
disposition to a strong desire to smoke 
(as well as to lung cancer). To estab- 
lish that cigarette smoking is geneti- 
cally determined would require stud- 
ies yielding more definitive data than 
are as yet available‘ Nevertheless, 
Fisher (1958) believes this genetic 

hypothesis has some merit on the 
basis of a reported German study of 
monozygotic and dizygotic twins—in 
which the smoking behavior of 51 
pairs of monozygotic twins is re- 
ported to be more nearly alike than 


that of 31 pairs of dizygotic twins. 
Fisher writes: 


The data so far assembled relate to 51 
monozygotic and 31 dizygotic pairs, from 
Tübingen, Frankfurt and Berlin. Of the first, 
thirty-three pairs are wholly alike qualita- 
tively, namely, nine pairs both non-smokers, 


* After this review went to 
Tarrant, Woolf, and England (1960), in a 
study of 2,360 British male subjects, reported 
finding that cigarette smokers are more extro- 
verted than nonsmokers, Reasoning from 
Eysenck's earlier work on a suggested genetic 
basis for his extraversion f 


actor, they con- 
cluded, therefore, that genotypic differences 
exist between 


smokers and nonsmokers, 
and between cigarette smokers and Pipe 
smokers. They suggest a number of further 
studies to establish this hypothesis even more 
soundly. 


press, Eysenck, 
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twenty-two pairs both cigarette andke 
two pairs both cigar smokers.  »1x Be a 
though closely alike, show some differen sin: 
the record, as in a pair of whom one au = 
cigars only, whereas the other smokes fei Bie 
and sometimes a pipe. Twelve Lapeer a: 
than one-quarter of the whole, show pe a 
differences, such as a cigarette smoker ore 
non-smoker, or a cigar smoker and a ciga 
smoker. . x 

By contrast, of the dizygotic pairs Ey 
eleven can be classed as wholly, alite ‘fe 
sixteen out of thirty-one are distinct y a 
ferent, this being 51 per cent against 24 p 

among the monozygotics. 

“The ite can be ecnige in eqveral T 
according to the extent to which atten Te 
given to minor variations in the ae Sie 
habit. In all cases, however, the man TT 
twins show closer similarity and fewer 

ncies than the dizygotic. he 
“There can therefore be little doub. ae 
genotype exercises a considerable in abit A 
smoking, and on the particular af twins 
smoking adopted, and that a study mpetent 
on a comparatively small scale is CO ble dif- 
to demonstrate the rather considera he dif- 
ferences which must exist between ra ds 
ferent groups who classify themse E 
non-smokers, or the different class 
smokers (p. 108). 


r 

Hammond (1958), on the er 
hand, believes this genetic hypo 
untenable: 


me 
It has been suggested that there may ba Sung 
hereditary factor which results in bar ettes 
cancer and a strong desire to smoke cigar it is 
This is an ingenious idea. However, i rette 
true, one must assume that (since aa the 
smoking has increased so considerably i this 
past half century) a genetic factor Spread 
sort appeared and became widely coun 
throughout the populations of many ae 
tries during the last fifty years. This “nation 
bit unlikely. Anyone with a good imag utual 
can think of other such conceivable et pre 
causation hypotheses, but no one ne em ps 
sented any evidence in support of yference? 
Other evidence, such as the sex di nds i 
the urban-rural difference, the time aia 
lung cancer death rates and cigarette co” poth- 
tion is consistent with the causal n 
esis... . With so much evidence all Tence 
ing in the same directon and no Sa onl 
Pointing in any other direction, I ¢4 inio™ 
arrive at one conclusion. In my ci the 
Cigarette smoking is a major factor 
causation of lung cancer (p. 350). 


~~ 
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It is clear from his remarks that 
Hammond is as skeptical of the evi- 
dence suggesting a hereditary predis- 
position to the desire to smoke as he 
is to an alleged hereditary predisposi- 
tion to lung cancer. 

However, in fairness to the genetic 
hypothesis, it should be pointed out 
that the possibility exists of a con- 
stant frequency of genetic predisposi- 
tion combined with a progressive in- 
crease and availability of cigarettes 
during the past 50 years. In short, an 
interacting, ecological relationship 
could exist between genetic make-up 
and culturally determined economic 
conditions (mass production, market- 
ing, etc.). 

A third question that deserves con- 
sideration is the possibility that 
smoking behavior is determined by 
multiple factors rather than by a 
single factor (personality or geno- 
type). Table 6 suggests that for any 
given individual, his smoking be- 
havior may be socioculturally deter- 
mined, age-sex linked, related to oc- 
cupation, and/or associated with a 
variety of personality and other be- 
havioral characteristics. The present 
meagre available evidence does not 
permit the determination of the rele- 
vant weight of these pertinent factors 
singly. Nor are there studies which 
show the multiple correlation with 
smoking across individuals, or within 
a single individual, of the various 
characteristics shown in Table 6. 

In addition to these uncertainties, 
it should be added that none of the 
studies here reviewed provides an 
answer to the “cause and effect” 
question in the relationship between 
psychological characteristics and 
smoking behavior. It will be clear to 
the reader that the studies just re- 
viewed showing, for example, slightly 
more “anxiety” or higher “neurotic 
indices” in smokers compared to non- 


smokers were merely studies of a re- 
lationship between smoking and these 
personality variables, since in no way 
were the studies designed to investi- 
gate cause and effect. There could be 
a complex mutually causative rela- 
tionship between smoking and the 
various characteristics shown in 
Table 6 (as well as smoking and lung 
cancer and/or coronary heart dis- 
ease). As Hammond (1958) points 
out, at this stage of our knowledge, 
for example, 

there is as much reason to suppose that ciga- 
rette smoking causes nervous tension as to 
believe that nervous tension causes cigarette 
smoking. Perhaps they are mutually causa- 


tive as in an autocatalytic type of reaction 
(p. 352). 


A similar statement regarding cause 
and effect could be made about each 
of the many other characteristics re- 
lated to smoking shown in Table 6. 
Longitudinal studies of youngsters 
before and after they have begun to 
smoke might provide some leads to a 
partial answer to the ‘‘cause versus 
effect” question. One such study 
might employ an anxiety question- 
naire and could provide information 
as to whether nonsmoking youngsters 
with higher anxiety scores begin to or 
do not begin to smoke in greater per- 
centages at a given age than do 
youngsters with lower anxiety scores. 
In addition, readministration of such 
an anxiety test several years later 
would permit study of the question 
whether or not youngsters who take 
up smoking later earn higher anxiety 
scores than control youngsters who 
have never taken up smoking; or earn 
higher anxiety scores relative to their 
own presmoking anxiety scores. 
Numerous other designs could supply 
additional information regarding 
cause, effect, and mutual interaction. 
In summary one can suggest that, 
while this review and the data sum- 
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marized in Table 6 make clear that 
smokers and nonsmokers do, in fact, 
differ on a number of psychological, 
personal, social, and behavioral char- 
acteristics, it is equally clear that re- 
search in this area (including appro- 
priate selection and sizes of samples, 
controls, etc.) has just begun. The 
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number of recent psychological stud- 
ies of smokers and nonsmokers sug- 
gests that publications in the next 
few years may reflect an even greater 
interest among behavioral scientists 
in this very common form of human 
behavior. 
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THE STRUCTURE OF ABILITIES AT THE 
PRESCHOOL AGES: 
HYPOTHESIZED DOMAINS! 


C. E. MEYERS 
University of Southern California 


This. paper discusses the possi- 
bilities for factorial descriptions of 
the abilities of the infant and young 
child. It brings forth the practical 
and theoretical need for test recon- 
struction and shows the potentials for 
doing so which reside in the current 
repertory of test materials. Finally 
the paper hypothesizes a series of 
factors which may ultimately be 
identified at the late preschool level, 
with some speculations regarding the 


early appearance of the hypothesized 
factors. 


CHARACTERISTICS OF ABILITY Ex- 
AMINATIONS FOR INFANTS AND 
YOUNG CHILDREN 


Most of the instruments in current 
use for appraisal of the abilities at 
the preschool ages were constructed 
after the prototypes of Binet and 
Gesell. In both instances these 
pioneers faced a need to describe a 
child in direct terms and for prac- 
tical usage. They took the easiest 
approach possible. The examined 
child was tried out on the behaviors 
of his own and other age groups, and 
was then described in terms of the 
age norms of the test items he passed. 

The test construction and report- 
ing of results so developed are illus- 
trated in the construction and use of 
the popular instruments in contem- 


1 Supported in part by the National Insti- 
tute of Mental Health Grant 3M-9130: 
Population Movement of Mental Defectives 
and Related Physical, Behavioral, Social, 
and Cultural Factors. „Pacific State Hos- 
pital, Pomona, California. 
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porary clinical practice. Today's ga 
scale construction also employs the 
two technical criteria for selective 
retention of tried-out test items, 
age-progression and internal con- 
sistency. Inasmuch as the com 
mon” behavior of young children 
changes with growth, the employ- 
ment of these technical criteria 
sharpens any age-to-age difference a 
the abilities sampled at differen 
levels of the examination, and nat- 
rows the spectrum of abilities sam 
pled at any one level. T 
Table 1 lists a series of stani 
ardized examining instruments avai i 
able for use at the various preschoo 
ages. Most of the instruments ° 
Categories A through D, which are 
the clinical tests of most concern 
here, were constructed on Binet-type 
Principles and procedures, and yiel 
an “age score” as a test result. The 
tests apparently satisfy enough of the 
clinical need to describe a child as he 
is now, in terms of age norms. i 
It can be shown that the instru 
ments are not the best conceivable 
ones for certain other purposes. One 
such purpose is that of clinical pre 
diction. Over a dozen studies have 
demonstrated that baby tests ea 
early preschool tests do not predict 
later intelligence very well (most re 
cently, Cavanaugh, Cohen, Dunphy: 
Ringwall, & Goldberg, 1957; Lae e 
born, 1956). It is also shown that t j 
best items for predicting future statu 
may be poor items on a criterion A 
internal consistency at the level aba 
they are placed (Nelson & Richards» 
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| TABLE 1 
“¢ ILLUSTRATIVE CURRENT ABILITY MEASURES WITH NOTATIONS ON 
ITEM SELECTION AND AGE-SCORING 
= 
| Type of Instrument and Preschool a i 
Stated or Implied Scope Ages Tested Bases; for Item/Selection Eoo 
A. General Behavioral Develop- 
4 ment a 
Gesell & Amatruda (1947) 1 month up Age-normal behavior DA (by median 
“Gesell schedules” success level) 
Griffiths (1954) 1-24 months Age-normal behavior; “G” Cumulated MA 
, Abilities of babies «broadly conceived 
| B. General Intelligence 
Cattell (1940) 3-24 months Age-normal behavior;some Cumulated MA 
Infant Intelligence Scale selection by face validity ` 
for intelligence 
p Shotwell, Dingman, & Tarjan 3 years up Quantitative tasks, pro- Cumulated MA 
, (1957) gression with MA 
Number concept test 
Terman & Merrill (1937) 2yearsup 7 with 1916 SB; face va- Cumulated MA 
Stanford-Binet lidity, age progression, in- 


ternal consistency 


Valentine (1950) 2 years up Age-normal; face validity Cumulated MA 
Intelligence tests for children 


Ammons & Ammons (1948) 2 years up Age-suitable recognition MA from points; 
Picture vocabulary vocabulary, r with SB centiles 


C. General Intelligence by Non- 
language, Culture-Free Means A F 
Arthur (1947) 4} yearsup Face validity for intelli- 
Point Scale gence; r with other tests 


MA from points 


Nonverbal, ease of resp., MA from points 
suitable to age, face, 7 
with other tests 


Burgemeister, Blum, & Lorge 3 years up 


(1954) . 
Columbia Mental Maturity 


2yearsup Age-suitable culture-fair, Cumulated MA 


Lei 
le face validity for G 


International Performance Scale 


D. Purposes Other than to Meas- 
ure Intelligence 
Bayley (1935) 
California Motor Scale 


1-50 months Age-normal motor Points; sigma for 
age 


Age-normal, self-help, cul- Social age by 


Doll (1953) 1 year up f 
ture-required conduct points 


Social Maturity Scale 


Motor, selected from Ose- Points to cen- 
retsky by objectivity, age tiles by age and 
progression sex 


Sloan (1955) 4 years up 
Lincoln-Oseretsky Motor Tests 
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TABLE 1—(Continued) 


Kind of Score 
Yielded 


Centile of points 
for Grade 1 


Centile of points 
for Grade 1 


Centile of points 


i to 
Factor points 
factor MAs; to 


Subtest and to- 


Type of Instrument and Preschool Bases tor Item Selection 
Stated or Implied Scope Ages Tested 
. School Readiness Examinations . ere - 
E Hildreth & Griffiths (1949) kinderg.- F ace validity via analis 
Metropolitan Readiness Tests primary of primary reading; r wit 
i later reading 
Lee & Clark (1951) kinderg.- Face validity by analysis 
Reading Readiness Test primary of primary reading 
35 kinderg.- Analysis of primary read- 
ee srid primary ing; 7 with later reading for ages 
F. Batteries for Testing Differen- 
tiated Abilities A 
Thurstone & Thurstone (1953) 5-7 years Factor analysis 
Primary mental abilities tal MA 
Sievers (1955) 2-6 years Osgood psycholinguistics 


Illinois language tests 


tal point norms 


model by age (11 tests. 


1938, 1939). All such studies (except 
those of Gesell) were preoccupied 
with the prediction of future “‘intelli- 
gence,” the criterion variable being a 
later Binet or other IQ. Little has 
been done by psychologists on the 
prognostic value of early testing for 
other variables of social and theo- 
retical interest. Will the motor sub- 
scale in Gesell and Amatruda (1947) 
tell of later athletic prowess or the 
age of ambulation in handicapped 
children? Do the personal-social 
items in Griffiths (1954) predict 
leadership in the fifth grade? Sur- 
prisingly little is available. Neilon 
(1948) reported good consistency in 
general behavior descriptions be- 
tween infancy and adolescence in 
Shirley’s (1933) famous subjects, and 
provides a good review of available 
literature. 

Gesell (e.g., Gesell, Castner, Thomp- 
son, & Amatruda, 1939; a summary 
in 1954) found consistency within 
broad categories of diagnosis, such as 
mentally deficient or palsied, and 


some intriguing instances of temna 
amental consistency in norma Her 
well. Escalona (1950) and Galler 
(1953) demonstrated that predict! e- 
from infant testing gave better se 
sults if subject testability was nae 
quately attended to. Neither foun 
degree of improvement that ee 
the general conclusion on predic 
bility. 6 
There have been three efforts ( 
identify test items having fore fa 
ing value. In the Fels series See 5 
& Richards, 1938, 1939; Richares i 
Nelson, 1939) the Gesell itera y 
alertness and perception at 6, 12, ‘ons 
18 months had promising corr elati 
with later IQ and were super jey 
the whole scale in this. The pr ; 
data (Hastings, 1952; Pinneau, have 
summary by Bayley, 1955), iding 
similarly been analyzed, Y! nee 
little of value before 18 mT des 
After 2 years the prediction beco 
surer. Tasks which are verpa ‘ie 
complex seem best. A third E 
study comes from Catalano 


| 
| 
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McCarthy (1954), where measures of 
infant consonantal differentiation cor- 
related about .40 with Binet later on, 
the magnitudes being little reduced 
when age of phonemic recording or of 
testing was partialed out. 

The above paragraphs suggest that 
precious little developmental test 
work has come forth in many years of 
time. Ability testing, preoccupied 
with technical purity of verbal and 
nonverbal G, has chosen to eschew 
the imitative, psychomotor, and other 
functions which do not correlate well 
“with later IQ,” omitting large areas 
of human competence which might 
have values in their own rights. 
Creative hours have been dedicated 
to projections of conative life. 

For want of usable standardized 
instruments, the personnel in certain 
healing arts have had to resort to 
improvisation. Other productions 
are unpublished, being used in the 
specific hospital, clinic, or school. 
Both varieties show a range of quality 
and occasional inventiveness in spe- 
cific items. Most are improvisations 
with existing materials, the Gesell 
items being the most favored. All 
share the shortcomings consequent 
to the lack of test-making art: (a) 
unclear directions for administration 
and scoring, (b) unspecified norms or 
none at all, (c) unawareness of how 
these shortcomings, together with 
“adaptation for local use,” can affect 
the difficulty level of an item. Some 
instruments worthy of mention, 1n 
spite of the shortcomings, are those 
of Haeussermann (1958), Johnson, 
Zuck, and Wingate (1951), and 
Kogan (1957). The clinical psycholo- 
gists working with such service per- 
sonnel have tried to choose more care- 
fully from existing instruments and 
have shown more awareness of the 
standardization problems (Gallagher, 
Benoit, & Boyd, 1956; Shontz, 1957). 
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One must at this point remind the 
clinical psychologist that it was a 
physician who developed the Bender- 
Gestalt (Bender, 1938). 

In spite of such demonstrated call 
for the services of psychological test 
makers, there has been little advance 
in either concept or practice in the 
ability testing of the young since the 
time of Binet. The elegant simplicity 
of the age-scale notion served the 
needs well for several decades. It is, 
however, no longer possible to defend 
narrow spectrum G testing. Voices 
have appealed for change (Bayley, 
1955, 1956; Sarason & Gladwin, 
1958; Thurstone, 1956). But the 
only energies exerted toward new 
scales which the present writers have 
knowledge of seem limited to those of 
students of Kirk and Osgood at Illi- 
nois (e.g. Sievers, 1955). 


POTENTIAL CONTRIBUTIONS OF 
FACTORIAL STUDY 


Beyond the need for greater clinical 
breadth and prediction, the growth 
studies could very well use some in- 
struments which provide whatever 
continuity is available from one age 
to another in the functions tested. 
Even more intriguing than continuity 
is the emergence and differentiation 
of those abilities which one calls “‘fac- 
tors’’ when the child grows up. 

The study of abilities at older 
levels has enjoyed considerable so- 
phistication. A fairly stable struc- 
ture of human competencies is taking 
form in replicated works (Fleishman, 
1957; French, 1957; Guilford, 1956, 
1957, 1958). Factorial study is not in 
its infancy. Its contributions no 
longer need to be in the form of 
scattered dramatic discoveries, but 
are systematic, stepwise contribu- 
tions. Not all the expected factors 
of adulthood have yet been opera- 
tionally described, nor have all the 
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TABLE 2 
Factors REPORTED IN STUDIES PERFORMED AT PRESCHOOL AGES 
Fhasnne Z; 
Thurstone 3, 
Richards McNemar (1942) Hofstaetter (1954) Kelley (1928) (Primary Mental 
& Nelson! 14 Analyses for 2 Years & Up imary Me 
i 5-7 years 
6 ia 8 2-2} years 3-4 years 5-6 years 0-2 years 2-4 years 3 yearsup | 33-64 years 
== maturity= | (intercorrelations 
e f s heteroge- .5 & up) 
neity, etc. d 
spalial no! 1 | perceptual spee 
Ez spatial no. 2 space, 
vi 
identifyin — 
wee ‘iia ti quantitative 
a numeration 
s auditory mem- 
ory span® 
motor or sensorimotor persistence intelligence | memory 
memory‘ alertness 


a 


` This array is that of the published test; the factor study was not published. 


Factors with apparent likeness are horizontally aligned. 


Labeled “verbal” we au 
Test items: form board, block tower, digit repetition. 


questions one raises about them or 
the structure they fit been answered. 
Whether the many factors have 
market value or how they emerge as 
nature-nurture products, are ques- 
tions which for the moment are be- 
side the main issue. If factorial 
descriptions of man’s abilities are 
available, one can make efficient 
measurement of whatever attributes 
one chooses or needs for a specific 
purpose. The need might be to know 
whether immediate memory span is 
“fixed” or trainable, what attributes 
willefficiently predict successful home 
leave from an institution, or the 
anticipation and Prevention of diffi- 
culties in primary reading. The main 
point is that a general ability or G, 
if it exists, is not greatly predictive 
from one young age to another. Cer- 
tainly it is an insufficient description 
for a psychologist to make of the 
competencies of a person, On the 
other hand, factorial study may en- 
able one to discern and differentially 
measure, with efficiency, the particu- 
lar congeries of traits which suit a 
specific purpose. 
It should be of advantage to seek 


by Kelley but tasks were two auditory span lists, 


for continuities between the estab 
lished factor structure for Lime | 
adults and the rather bulky amoun 
of information which is available on 
the observations and expenie 
study of the infant and young child. 


Previous Factor Studies 


Attention is first given to those 
factorial studies of abilities known | 

ave been performed at preschoo 
ages. Only five, all of them Amer 
can, have been found. A perusal 0 
British summaries of factor literature 
(e.g., Thomson, 1950; Vernon, 1950) 
did not reveal evidence of any othe 
preschool analyses. The finda 
these studies are presented (Table 2 
not to claim that some group factor 
have been discovered, replicated, a 
accepted, which is hardly the caser 
but only to show that something 
beyond a general factor can be toun 
Some discussion of the findings of t 
previous studies seems warrante y 

Richards and Nelson (1939) ane 
lyzed the interitem tetrachoric 7's O 
the Gesell items in the Fels data: 
Clearly emerging on the angen 
extractions at 6, 12, and 18 mont 
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TABLE 3 
FINAL Factor VALUES FOR THE POPULATION OF 107 KINDE 
ATION OF NDERGARTEN 
AT CLOSE OF Forty-E1GHTH SUCCESSIVE APPROXIMATION* PROSEN 
Maturity, 
y Control 
Hetero- ,; Spati i 
Tests Used e r patial Spatial of Mean- 
si geneity, Verbal Memory “yo 1 No. 2 iess 
etc. Content? 
1. Memory for Meaningful 
Forms -79 .13 
2. Control of Meaningful Vis- 
ual Memory Images Wai +27 —.18 
3. Memory of Meaningless 
Forms 62 18 227 50 
4. Control of Meaningless 
Visual Memory Images .63 15 42 
5. Memory for Verbal Ma- 
terial 49 -61 -50 
6. Divided Forms Test .52 -58 — 24 
7. Knox Cube Test -62 -36 -13 


® From Kelley (1928, p. 148), with slight modification. 


were two factors labeled “alertness” 
and “motor.” At 12 months there 
was a hint of a third factor. 
McNemar (1942) reported on the 
standardization of the 1937 Stanford- 
Binet. Included were analyses which, 
unlike most factor efforts, were de- 
signed to disclose whether the item 
selection of the famous scale was 
carefully enough done to have avoid- 
ed group factors. The unrotated 
centroid analyses demonstrated at 
most age levels the excellence of item 
selection from the viewpoint given: 
a first loading accounted for most of 
the common factor variance at nearly 
all age levels. Exceptions included 
2-0 and 2-6, at which there were two 
apparently unimportant loadings of 
unclear meaning, evident in the 
identifying, memory span, and move- 
ment items. 
Hofstaetter (1954 


trix of interage test correl 
results showed a “sensorimotor” alert- 
ness with best loadings in the first 
two years, “persistence” from two to 
four, and “intelligence” from three 


) analyzed a ma- 
ations. The 


years on. Thus a sophisticated treat- 
ment confirms what is seen in the 
inspectional analysis of test content 
and the study of age-to-age inter- 
correlations. 

The above-mentioned three studies 
were analyses of already existing 
data. Only Kelley’s and the Thur- 
stones’ analyses sought for factors. 
Kelley (1928) was the pioneer and 
provided factor reports for seventh 
grade, third grade, and kindergarten 
children. With a highly rational but 
painstaking and difficult method no 
longer used, he reported on the upper 
two groups a pattern that we today 
would identify as verbal, numerical, 
spatial, etc., with loadings also on 
what he prefers to call “maturity- 
heterogeneity” rather than G. Only 
his kindergarten array is of interest 
here. The original table is substan- 
tially reproduced in Table 3. 

The findings deserve some discus- 
sion. First, Kelley was one of the few 
in his day in resisting G, believing the 
British data for it to be due to in- 
sufficiently controlled variability in 
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sex, age, and background of subjects 
and to unrealistic treatment of error 
in residuals. Kelley labels his own 
first factor as maturity-heterogeneity 
(his age range at kindergarten was 
3-6 to 6-3). Second, what he called 
“verbal” should be called “memory 
span,” or “immediate memory,” as 
the items required only serial recital 
after one hearing, not semantic inter- 
pretation. This “verbal” (or memory) 
is found for only one entry in the 
matrix. 

Leaving aside the characteristics 
of the work in today’s terms, the 
Kelley report is not contradicted by 
later data. There is replication to 
the extent the Thurstones, with 
group tests, explored down to age 5}. 
The Kelley study remains the best 
available model for anticipation of 
factors. 

The second and remaining analysis- 
for-factors’ sake at preschool ages 
was the work of the Thurstones 
leading to the PMA tests (Thurstone 
& Thurstone, 1953) which included a 
level for ages 5-7. According to the 
publisher,’ the factorial analysis of 
the 5-7 age group has not been pub- 
lished; the norms for the published 
tests were established on a new 
group. The 5-7 PMA has this array: 
motor, perceptual-speed, verbal, spa- 
tial, and quantitative (replaced by 
Nat higher levels). “Motor” is tested 
by requiring the drawing of pencil 
lines between dots and compares to 
“dynamic precision” at higher levels. 
It is not in the array of the PMAs for 
older subjects. Regarding the Q or 
quantitative, the Thurstones point 
out that N and R evolve from Q 
sometime after the kindergarten-pri- 
mary level. à 

While the original factor study was 
not published, the Technical Supple- 


2 Robert H. Bauernfeind, Personal com- 
munication, February 18, 1959. 


ment to the Manual provides some 
information on groups of pupils uti- 
lized for correlational studies. There 
is less evidence of factor clarity at 5-7 
compared with the higher ages. In- 
tercorrelations were .50 and up. 
With an age range of only 5-9 to 6-8, 
the prospect of reduction by partial r 
issmall. Another table shows r's with 
Binet; only motor is below .50. : 

The PMAs cannot be considered 
replications of specific Kelley factors, 
but the two reports do reinforce the 
conviction that factors can be found. 
Note that Kelley had a first loading 
called maturity-heterogeneity and 
the Thurstones found considerable 
interfactor correlation. To labe 
either result the consequence of Gis 
as unwarranted at this time as be 
conclude the cause is differentia 
testability or variance due to testing 
conditions. The McNemar report 
had little common factor variance 
beyond the first loading, but in this 
instance “purity” was built into the 
test material by preliminary steps: 
The general conclusion, therefore, 15 
that factors will emerge once appro- 


priate test materials are made avail- 
able. 


Problems in Conducting Factorial 
Studies with the Young 


That only five studies have bee? 
located is not entirely due to lack © 
interest. The problems of testability 
at preschool ages are discouraging: 
It is not until fourth or fifth grade 
(age 9 or 10) that a typical group ° 
children have enough reading ability 
and conformity to be tested easily 1? 
full class groups. In second and thir 
grades (and some fourth) one must 
read directions aloud with the chil- 
dren and provide monitors; even $0» 
only 30 minutes of adequately com 
trolled testing are accomplished. o 
who watches the process might chal 


O 
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lenge the word “adequately.” At 
kindergarten it is the custom to do 
readiness evaluation in groups not 
exceeding four or five. School psy- 
chologists are not at all sure that 
even this situation gives control over 
testing circumstances, especially when 
other personnel than themselves do 
the administering. It is almost cer- 
tain that individual differences in 
conformity, distractibility, broken 
pencils, and the like have contributed 
to variance and lie behind the larger 
test and factor intercorrelations found 
in the younger level when different 
levels have been compared. The 
skeptic should watch the PMA being 
administered even to a small, well- 
motivated kindergarten-primary group. 
Below kindergarten, of course, no 
useful testing other than individual 
can be accomplished, and the cost 
factor becomes significant. 


HYPOTHESIZED FACTORS 


Introduction 

It is convenient to return briefly 
to the issue of prediction from an 
early to a later age. The discovery of 
factors at age 5 is of merit in itself, 
whether or not what is found has 
continuity with abilities at other 
levels. Relation with findings at 
higher age levels can be taken for 


granted. It is another matter with 
nward from 


respect to extension dow 

There is little 
guarantee that any particular ability 
seen at age 5 is represented at 3 or 
earlier. In other words, 
prediction from 
ously discussed, may 
due to poor instr t 2 
degree, qualitative difference exists 
between infant and ck 
of whether abilities emerge v 
entiation or V! 
tainly enters. 
provides good 


discussion of this. But 
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answers to these and related ques- 
tions cannot be produced till the 
instrumentation capable of detecting 
what exists has been created. 

A further introductory note is re- 
quired. The word “‘ability’’ needs 
more than an implicit definition. It 
is here regarded as that functioning 
which does not include (a) the so- 
called vestigial reflexes such as the 
plantar; (b) the vegetative responses 
even if they can be yolitionally con- 
trolled at times, such as sphincter 
activity; (c) random, apparently un- 
aimed “emitted” movement of the 
skeletal sets, such as the arm and leg 
motions of the infant. 

It is more difficult to say what is 
included, but tentatively, it is be- 
havior change which is guided by 
current or previous sensorial input. 
For example, cessation of movement 
at a sound is regarded here as an 
ability. This breadth of concept is 
wanted in order to include within 
the scope of human “adaptive abil- 
ity” any movement, change in move- 
ment, cessation of movement, readi- 
ness for movement, etc., when such 
conduct is performed in direct or de- 
layed consequence of experiencing 
with the senses. Of necessity; such 
movements and their attributes must 
include such nonintellectual dimen- 
sions as strength, simple reaction 
time, and the like. 

For clarity’s sake, the above dis- 
cussion and the test items to be men- 
tioned might be thought of in the 
traditional expression: S-O-R. 

As usual, the term S represents 
the stimulus situation or any portion 
of it we choose to center upon, cur- 
rent or past. As usual, R represents 
a response, 4 muscular activity or 
change in same, of any recordable or 
observable sort. Discussion of O is 
postponed for the moment. 

In the use of a test of perceptual 
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functioning, it is hoped that the ob- 
tained individual differences were 
due to perception, not to O or R. A 
response must be made by the sub- 
ject, of course. But the examination 
process provides that there is sim- 
plicity and ease in the means of re- 
sponse. Hence in a perceptual task a 
subject signifies his recognition or 
discrimination by a gesture, a lever 
depression, a word, or a pencil mark. 
On the other hand, if one seeks in- 
dividual differences in response speed, 
strength, dexterity, sentence length, 
imaginativeness, or other quality, 
then the input factors should bring 
no differential difficulty to the sub- 
jects. 

With respect to the O term above, 
the present paper needs to presume 
no special theoretical position. The 
intermediation between input and 
response can stand for “mind” or 
physiology. The expression “think- 
ing factors” is utilized below, and 
refers without commitment to what- 
ever intermediates. Test items which 
have their difficulty in “mental” or 
thinking intermediators should put 
no burden upon the perceptual or 
response operations of the examinees. 
one uses a time or space gap 
n S and R: one requires the 
subject to draw a conclusion on 
similarity or difference, to extrapo- 
late, to find a rule, to make a judg- 
ment, to find new uses or combina- 
tions. 

The hypothesized factors now pre- 
sented are grouped, to some extent, 
on the grounds just presented, 

Table 4 presents domains and 
hypothesized factors within them, 
for ages 4-6. It also shows estimated 
ages of emergence of the proposed 
abilities as identifiable factors. The 
table also lists sources in the litera- 
ture which implicitly or explicitly 
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support certain of the differentiations 
assumed in the array. l 

Three domains ought to be fairly 
certain of existence: psychomotor, 
perceptual, and psycholinguistic. The 
seven which are specified result from 
the separation of auditory ae 
visual perception, the separation o 
gross-body psychomotor from hand- 
eye, the separation of receptive from 
expressive psycholinguistics, and the 
differentiation of mental or thinking 
from all of these. 

It is not claimed that all the hy- 
pothesized factors can be found. No 
immediate claims about orthogonal- 
ity can be made. Until the first few 
simple matrices are analyzed, one 
can make only guesses. 

The flee ane listed in Table 1 
(including the school readiness tests 
and the differentiated batteries) have 
abundant raw material for the con- 
struction of tentative scales having 
some promise of purity. The re- 
search literature (e.g., Gewirtz, 1948a, 
1948b) provides further ideas for 
items seeming to represent relatively 
narrow spectrums of ability like 
those of “reference” factor tests- 
Hence, some item sources pertinent 
to the speculated abilities have been 
entered into Table 4. 


The Motor Domains 


Two general groups of motor fac- 
tors are presumed: (a) the whole- 
body, in which gravity is defied 10 
postural and locomotor achievements 
and (b) the hand-eye or manipula- 
tional. The developmental psycholo- 
gist will quickly note that the motor 
items in baby scales, in the Bayley- 
California (1935) and the Oseretsky 
(Sloan, 1955) can be easily subdivided 
into whole-body and hand-eye items. 
Gesell did this a generation ago 1! 
separately listing motor and adap- 


aea 


ed 
See 
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tive, while Griffiths (1954) utilized 
similar rubrics. In a more sophisti- 
cated way, Guilford (1958) provides 
a structure which distinguishes “gross” 
psychomotor factors from those of 
specified parts of the body. 

Taking useful reaction to gravity 
as the first expression of psychomotor 
ability, then the earliest response 
occurs in the first month. The normal 
baby shows momentary maintenance 
of head posture when held erect, and 
can lift his face from prone and turn 
it so as to breathe. The onward 
development includes postural and 
locomotor achievements against grav- 
ity. There being no factor studies at 
Preschool ages to give leads, the ob- 
servations of Gesell and others must 
remain the best sources of the guesses. 
Whole-body motor differentiated from 
hand-eye within the first year, with 
low congruence believed to occur be- 
fore two years. Among whole-body 
activities, the antigravity and the 
locomotor are different by “face” 
appearance, and clearly separate in 
adulthood (Guilford, 1958). This 
distinction appears in the words 
“static” and “dynamic” in Table 4. 
Within each of the whole-body 
and hand-eye domains, differentia- 
tion should be easily observable be- 
fore school age. In fact, perusal of 
Table 4 shows that greater differenti- 
ation is predicted in the psychomotor 
than in other domains. This particu- 
larization is suggested by the low 
intercorrelations of various motor 
tests (Jones & Seashore, 1944). Adult 
motor abilities are Surprisingly spe- 
cific (Seashore, 1951) and differen- 
tially trainable (Fleishman, 1957). 

There are likewise no factorial 
studies for later preschool age to 
utilize for prediction. There are 
normative observations of motor 
progress till about first grade, and 

data from physical education, fitness 


testing, and the Oseretsky scale, 
thereafter. What is known factorially 
comes from the physical education 
literature (e.g., McCloy & Young, 
1954) and from “human engineering 
(e.g., Fleishman, 1953, 1957). The 
subjects of the former are secondary 
school or college learners and the 
domain is usually whole-body per- 
formance; the latter typically has 
adult subjects and a hand-eye do- 
main. A “psychomotor structure 
presented by Guilford (1958) puts 
the factor work into one perspective 
and is the basis for the motor domains 
proposed here. f 

There are age and sex differences 
in times of reaching maxima. The 
development in girls in vigorous 
athletic performance goes into 4 
plateau or declines at 12 or 13 years, 
while that of boys continues to gain 
till 18 or 20 years (Espenshade, 1940; 
Jones, 1949). Measures of balance 
and coordination plateau for both 
Sexes in adolescence (Jones & Sea- 
shore, 1944) while muscular strength 
grows apace in both sexes but more 
so in boys, for more years (Jones, 
1949). Hence, while it is too early to 
anticipate the ages of the various 
factor appearances, strength shoul 
certainly be one of the first to emerge: 
as a factor separate from those men- 
tioned and Within both whole- an 
part-body activities, 

Tosum up on motor factors, whole- 
vs. part-body should be evident 
within 2 years; a nearly full display 
like the adult structure of Guilfor 
is expected by 5 years of age. 


Visual Perception 


The utilization of exteroception 
and somesthesis is observed within 
the first weeks of life. One may see 
Primitive awareness or discrimina- 
tion at 1 month or earlier, in a change 
in conduct ata change in the stimulus 
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field (visual pursuit, cessation of 
movement at a sound, etc.). At 3 or 
4 months the primitive awareness 
has become recognitive, perhaps only 
at a conditioning level, by the crite- 
rion of anticipatory behavior at sight 
of bottle or mother. The word “‘per- 
cept” is deserved at 6 months if not 
earlier in that the infant seems to be 
able to carry a recognition past the 
immediate time or space. Discrimi- 
nating, recognizing, and percept build- 
ing must precede, in all logic, the use 
of receptive input when that use is 
more than reflexive. The order of 
items in the Gesell scale, for example, 
shows recognitive behavior before 
adaptive positive response. 

A distinction between perception 
and spatial ability should be evident 
at 3 years, according to Thurstone 
(1948). Kelley’s study identified 
two visual factors, one probably spa- 
tial, in subjects whose ages reached 
down to 3}. The Merrill-Palmer 
(Stutsman, 1931) and the Stanford- 
Binet (Terman & Merrill, 1937) have 
items which appear to be both of the 
speedy differentiating kind and of 
the visualization sort at several 
levels. In a study relating reading to 
various Thurstone perceptual tests, 
Goins (1958) identified two factors. 
The first appeared mainly in the 
timed tests while the second might be 
called spatial, as it appeared on un- 
timed tasks of pattern completion, 
copying, reversals, etc. Of pedagogi- 
cal significance is that the first grade 
reading test loaded on the two fac- 
tors .346 and .609, respectively. If 
these loadings surprise the noneduca- 
tionist, it should be recognized that 
primary reading is largely “saying 
Words out loud” with little or no Vv 
at the outset. At older ages other 
studies (except a few British ones; 
see Vernon, 1950) established that 
space is separable from P, and the 


complexity becomes considerable in 
adulthood (French, 1957; Guilford, 
1956). 

In summary, therefore, it is pro- 
posed that visual perception will be 
seen as noncongruent with motor and 
other abilities at 6 months and that 
by 3 years will differentiate into 
separably measurable if not orthog- 
onal perceptual vs. space factors. 


Auditory Perception 


It is necessary to point out that 
the distinction made here between 
visual and auditory perception is on 
theoretical grounds. The classic 
Thurstone (1938) work on perception 
was exclusively on visual. The sub- 
ject of channel differences intrigues 
the communication people (Osgood 
& Sebeok, 1954) and was suggested 
by Thurstone himself (1948). 

Evidence for more than reflexive 
use of auditory input can be detected 
as early as evidence for similar use of 
visual input. That is, within the first 
month, a normal baby is quieted by 
sound or otherwise changes his ac- 
tivity. However, it is less clear how 
auditory skills proliferate for the 
necessary testing confuses auditory 
discriminatory skills with word se- 
mantics. While a child who selects 
“bear” separately from “pear” in a 
“spondee test” clearly shows ability 
to distinguish the two initial sounds, 
one cannot tell how early this per- 
ceptual distinction occurs by simple 
testing before a good ‘‘auding” vo- 
cabulary has been acquired. It is 
necessary to use differential condi- 
tioning or discrimination learning. 
It is therefore necessary, in practical 
testing, to ignore auditory figural 
discrimination vs. auditory semantic 
factors at their earliest and to limit 
present predictions of high refine- 
ment of factors to about second grade 
testability. 
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But if the resources for extensive 
investigation become available, it is 
predicted that such factors analogous 
to the visual can be detected by 2 
years and before: space orientation 
via sound, differential reaction to 
word-sound phonemes apart from 
meaning, pitch discrimination, and 
many others. 

Linguistic 

Receptive, The distinction between 

receptive and expressive language 
was known in the literature of aphasia 
before Thurstone distinguished W 
from V (cf. French’s review, 1951). 
The distinction is demanded in com- 
munications theory (e.g., Osgood & 
Sebeok, 1954). In addition, channel 
differences are also suggested by 
these theorists as well as by the 
auding literature from educational 
psychology (Brown & Caffrey, 1952; 
Caffrey, 1955; Sassenrath & Holmes, 
1956) and experience with slow 
learners (Durrell & Sullivan, 1958; 
Kirk & McCarthy, 1950). Thus, one 
Must not yet assume that an auding 
factor in prereading ages is congruent 
with later V developed with printed 
tests, 

Receptive factors are hypothesized 
as shown in Table 4. The auditory 
channel includes a factor of speech- 
sound perception. Difficulties in 
testing for it were mentioned just 
above. One might anticipate it at 
those levels at which it must be 
presumed present—before 1 year of 
age when a child shows evidence of 
differential understanding of words 
before he can use them expressively, 
The hypothesized visual comprehen- 
sion factor would be only primitive 
as yet. Its origins may be detectable 
at ages 5 and up by items from readi- 
ness tests in which meaning is ob- 
tained, for example, in specifying the 
significance of a traffic signal. Note 


that both auditory and visual per- 
ceptive factors are located in both 
linguistic and the perceptual domains. 

Expressive. The array here also 
distinguishes the vocal from the 
graphic-gestural modes of expression. 
Since the age of interest here is pre- 
school, the written language could at 
best be rudimentary; at kindergarten 
the ability to hold a pencil, use it, and 
print one’s name is generally ex- 
pected, little more. Psychomotor 
factors which are listed under ap- 
propriate heading are repeated here. 


Thinking 


Referring back to the S-O-R dis- 


cussion, “thinking” refers to that 
which is intermediate between dis- 
crimination of the stimulus matrix 
and the making of the response. The 
intermediation may be memory OF 
visualization, rule-finding or abstract- 
ing. In any case, a test item needs to 
have its difficulty in O and not in 
stimulus discrimination or response 
refinement, and certainly not in word 
knowledge as such. For children of 
5 or 6, the verbal problem is best 
avoided by simple omission. The 
choice of item content is limited 
largely to elemental figural materials 
among which discriminations are 
known to be made in the first year of 
life. 

Hypothesized are memory spams 
abstracting, and reproduction of fig- 
ural models. The last-mentioned is 
probably complex; in adult work the 
Kohs and the pattern copying tasks 
are loaded on space but also on rea- 
soning, perceptual, verbal, and spe- 
cific factors (French, 1951). Obvi- 
ously it is going to cause some prob- 
lems to separate space from thinking 
and perhaps this task cannot be 
accomplished so long as figural ma- 
terials are used. 


G 
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Memory span of course is easily 
obtained by both verbal and gestural 
means, provided the contents are 
“easy” to begin with. Digit span for 
the normally cultured child is satis- 
factory. The Knox cubes provide 
another span-measuring process of 
reasonably apparent purity. 

Further speculation about the 
thinking factors is probably prema- 
ture. There is insufficient experience 
at preschool ages with analogs of, 
for example, what Guilford would 
call productive thinking. Existing 
Binet and WISC items are of three 
main sorts: immediate memory span; 
perceptual discriminations and iden- 
tifications; and information pits? 
brought forth by items of the sort, 
What should you do when, Which is 
prettier, How many, etc. 

It is not till the age-levels of 6 and 
7 that the Stanford-Binet uses pro- 
ductive thinking items. At these age- 
levels one finds verbal analogies, 
picture absurdities, and similarities 
and differences. Defining thinking 
as the processing of information to 
produce a conclusion, then thinking 
is so introduced. Such items are 
concerned with semantic content. 
Symbolic content also is rare at 
younger levels. It was already pointed 
out that one competent team found 
no “number,” a symbolic ability, in 
the age bracket 5 to 7 years (Thur- 
stone & Thurstone, 1953). The mak- 
ing of rhymes, which would be an- 
other ability in symbolic content 1S 
called for in the Stanford-Binet at 
the 9-year level. ? 

The WISC (Wechsler, 1949) is 
another scale in common use which, 
like the Binet, essentially lacks 
thinking items, as above described, 


below the mental-age equivalent 
of about 7. The first items in the 
nly count- 


Arithmetic subtest requireo 


ing. But the WISC has some verbal 


analogies with mental-age values at 
around 5 and 6 years. 

Items calling for “divergent” pro- 
ductive thinking are almost non- 
existent in mental tests. The WISC 
has none. The Binet (Form L) calls 
for word-saying at age 10. Monroe’s 
(1935) old readiness test calls for 
naming of objects in specified classes 
(e.g., animals). 

Hence the preschool age-levels of 
popular batteries are barren of think- 
ing items; the speculator has few 
reference points on which to develop 
a system. The many observations of 
Piaget should not be ignored, nor 
those of American investigators on 
problem solving in the young (a re- 
cent example, Braine, 1959). But 
the aggregate of items and ideas 
from these scattered sources is not 
large. A considerable amount of 
inventive elaboration is required if 
even a modest 6-year-old model of 
adult-level structure is wanted. 


SUMMARY 


This paper has reviewed the situa- 
tion of ability testing at the infant 
and preschool ages. The available 
instruments have, on the one hand, 
failed to predict “later IQ,” for what- 
ever value later IQ has as a criterion. 
On the other hand, the spectrum of 
abilities tested, which if anything 
should broaden as the child grows, is 
caused to narrow down by the use of 
technical criteria of age-progression 
and internal consistency in item se- 
lection. The previous factorial stud- 
ies at preschool ages, of which only 
one (Kelley's) may be regarded as of 
quality, give reason to believe that 
carefully prepared testing could dem- 
onstrate that more than a G is pres- 
ent. Finally, a series of factors is 
hypothesized, representing seven do- 
mains of whole-body and hand-eye 
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psychomotor, visual and auditory 
perception, receptive and expressive 
psycholinguistics, and thinking or 
mental. The hypotheses are made 


for ages 4—6 years, with some specu- 
lations as to age of earliest emergence 
of the factor. Illustrative item-types 
and supporting literature are given. 
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